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Abstract. The Witt construction describes a functor from the category of 
Rings to the category of characteristic rings. It is uniquely determined by 
a few associativity constraints which do not depend on the types of the vari- 
ables considered, in other words, by integer polynomials. This universality 
allowed Alain Connes and Caterina Consani to devise an analogue of the Witt 
ring for characteristic one, an attractive endeavour since we know very little 
about the arithmetic in this exotic characteristic and its corresponding field 
with one element. Interestingly, they found that in characteristic one, the 
Witt construction depends critically on the Shannon entropy. In the current 
work, we examine this surprising occurrence, defining a Witt operad for an 
arbitrary information measure and a corresponding algebra we call a ther- 
modynamic semiring. This object exhibits algebraically many of the familiar 
properties of information measures, and we examine in particular the Tsallis 
and Renyi entropy functions and applications to nonextensive thermodynamics 
and multifractals. We find that the arithmetic of the thermodynamic semiring 
is exactly that of a certain guessing game played using the given information 
measure. 
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1. Introduction 

The past few years have seen several interesting new results focusing on various 
aspects of the elusive "geometry over the field with one element" , see for instance 
[7] [n] [12] [32] [36], [48], among many others. The idea of Fi-geometry has its 
roots in an observation of Tits |51j that limits as q — ?• 1 of counting functions for 
certain varieties defined over finite fields Fg exhibit an interesting combinatorial 
meaning, suggesting that the resulting combinatorial geometry should be seen as 
an algebraic geometry over a non-existent "field with one element" Fi . Part of the 
motivation for developing a sufficiently refined theory of varieties and schemes over 
Fi lies in the idea that being able to cast SpecZ in the role of a curve over a suitably 
defined SpecFi may lead to finding an analog for number fields of the Weil proof 
[55] of the Riemann hypothesis for finite fields. 

Among the existing approaches aimed at developing various aspects of geometry 
over Fi, the one that is of direct interest to us in the present paper is a recent 
construction by Connes and Consani [10] , [11] of semirings of characteristic one (a 
nilpotent hypothesis) . These are endowed with an additive structure that provides 
an analog of the Witt formula for the addition of the multiplicative Teichmiillcr 
lifts in strict p-rings. As observed in [10] and [11], the commutativity, identity, 
and associativity conditions for this addition force the function used in defining the 
Witt sums in characteristic one to be equal to the Shannon entropy. 

The goal of this paper is to explore this occurrence of the Shannon entropy in 
the characteristic one Witt construction of [10] and [11]. In particular, we show 
here that the construction introduced in those papers can be seen as part of a more 
general theory of "thermodynamic semirings" , which encodes various properties of 
suitable "entropy functions" in terms of algebraic properties of the corresponding 
semirings. 

After reviewing the case of [10], [11] in ^ we present a general definition and 
some basic properties of thermodynamic semirings in !j3] and |j4l based on the ax- 
iomatization of information-theoretic entropy through the Khinchin axioms and 
other equivalent formulations. We then give in Sj5] a physical interpretation of the 
structure of thermodynamic semiring in terms of Statistical Mechanics, distinguish- 
ing between the extensive and non-extensive cases and the cases of ergodic and 
non-ergodic statistical systems. We see that the lack of associativity of the thermo- 
dynamic semiring has a natural physical interpretation in terms of mixing, chemical 
potentials, and free energy. This generalizes the thermodynamic interpretation of 
certain formulas from tropical mathematics considered in [43j . 
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We focus then on specific examples of otfier important information-tlieoretic 
entropy functions, sucli as the Renyi entropy, the Tsallis entropy, or the Kuhback- 
Leibler divergence, and we analyze in detail the properties of the corresponding 
thermodynamic semirings. In fJSl wc consider the case of the Rcnyi entropy, which 
is a one-parameter generalization of the Shannon entropy that still satisfies the 
cxtensivity property In Sj7] we focus instead on the Tsallis entropy, which is a 
non-extensive one-parameter generalization of the Shannon entropy, and we show 
that a simple one-parameter deformation of the Witt construction of [TU] and [TT] 
identifies the Tsallis entropy as the unique information measure that satisfies the 
associativity constraint. 

In fj8]we consider the case of the KuUback-Leibler divergence or relative entropy 
(information gain) , and we show that thermodynamic semirings based on this infor- 
mation measure can be associated to univariate and multivariate binary statistical 
manifolds, in the sense of information geometry, and to multifractal systems, in such 
a way that the algebraic properties of the semirings detect the statistical and mul- 
tifractal properties of the underlying spaces. We also relate a hyperfield structure 
arising from the KL divergence to those considered in [54] . 

We also show in f}9]that the algebraic structure of the thermodynamic semirings 
can be encoded in a suitably defined successor function and that the properties 
of this function and its iterates as a dynamical system capture both the algebraic 
structure of the semiring and the thermodynamical properties of the corresponding 
entropy measure. We give explicit examples of these successor functions and their 
behavior for the Shannon, Rcnyi, and Tsallis entropies. In ij9.3l we show that this 
function has an interpretation as the cumulant generating function for the energy, 
which reveals some futher thermodynamic details of our construction. 

Finally, in ijlOl we phrase our construction using operads whose composition 
trees suggest an interpretation in terms of "guessing games" . Exploring this, we 
show that relations in a particular algebra-the thermodynamic semiring-for the 
guessing game opcrad correspond naturally to information theoretic properties of 
the entropy functions, cominiscent of an operadic characterization studied recently 
by Baez, Fritz and Leinster, which we review. This allows us to rephrase Connes 
and Consani's original construction in a way that makes clear why the Shannon 
entropy plays such a key role and provides a catcgorification of entropy functions. 

In the last section we outline possible further directions, some of which will even- 
tually relate back the general theory of thermodynamic semirings to the analogies 
between characteristic p and characteristic one geometries. Thus, this point of view 
based on thermodynamic semirings may be regarded as yet another possible view- 
point on Fi-geometry, based on information theory and statistical geometry, a sort 
of "cybernetic viewpoint" . 

1.1. Witt vectors and their characteristic one analogs. Witt vectors were 
first proposed by Ernst Witt in 1936 to describe unramified extensions of the p- 
adic numbers. In particular, Witt developed integral polynomial expressions for 
the arithmetic of strict p-rings in terms of their residue rings. 

A ring i? is a strict p-ring when R is complete and Hausdorff under the p-adic 
metric, p is not a zero-divisor in R, and the residue ring K = R/pR is perfect [55] . 
[H], [17] ■ The ring R is determined by K up to canonical isomorphism, and there 
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is a unique multiplicative section r : if — )> i? of the residue morphism n : R ^ K, 
ie. 

TT o T — id/f , T{xy) = T{x)T{y) \fx, y G K. 
Every element a; of i? can be written uniquely as 

X = ^ T{Xn)p'\ Xn G K. 

The t{x) are called Teichmiiller representatives. 

When K — ¥p, R = Zp, but the Teichmiiller representatives are not {0,1, ... ,p— 
1} as they are in the common representation of Zp. Instead they are the roots of 
x^ — X. We see from this example that the arithmetic in terms of the Teichmiiller 
representation above is nontrivial. The Witt formula expresses the sum of these 
representatives as 

(1.1) r(.T) + T{y) = f ( ^ wpia, T)x^y'~n, 

where /p = {a G Q n [0, 1] \ p'"-a G Z for some n}, t : K[[T]] ^ R is the unique 
map such that f{xT") = t{x)p"', and 'Wp{a,T) G Fp[[T]] is independent of R. Note 
that, since K is perfect, the terms make sense. 

The idea of [10], [11] is to generalize this to characteristic one by considering 
sums of the form 

(1.2) x®wy:=Y,w{a)x"y^-'' 

where now / = Q n [0, 1] over sufficiently nice characteristic one semirings. 

According to Definition 2.7 of [TT], a semiring is characteristic one when 1 + 1 = 1, 
i.e. when it is idempotent. For example, the tropical semifield, T = K. U {— oo}, 
with addition given by the sup and multiplication given by normal addition, forms 
a well studied characteristic one semiring in the context of tropical geometry [24j , 

m- 

Connes and Consani found in |10| . [11) that, over a suitably nice characteristic 
one semiring, (Bw is commutative, associative, shares an identity with +, and is 
order preserving if and only if w{q) is of the form 

(1.3) w(a)=pS''("\ 

where p ^ 1 and Sh(p) is the well known Shannon entropy 

(1.4) Sh(rt = -C{p\ogp +{l-p) log(l - p)), 

where we write log for the natural logarithm, and where C > is an arbitrary 
constant factor. 

In this paper, we attempt to elucidate this surprising connection between the 
algebraic structure of the semiring and the information theoretic entropy by devel- 
oping a broader theory of thermodynamic semirings. 
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2. Preliminary notions 

We introduce here some basic facts that we wih need to use in the rest of the 
paper. 

We start with a warning about notation. Throughout most of the paper we will 
work implicitly with R™'"'+ U {00} or R™g'''* in mind (note the two are isomorphic 
under — log). As such, we will use the notation in one of the two. Which one we 
use should hopefully be clear from the context. We do this because the first will 
give expressions looking more like statistical physics equations, and the second will 
give expressions more similar to the Witt construction in characteristic p. We will 
tend to write (Bs,t (perhaps with other relevant subscripts) for the Witt addition, 
to indicate that it is a modification of the additive structure of the semiring, and 
that it depends on the choice of a binary information measure (or entropy) S and 
of a temperature parameter T. This is motivated by tropical geometry, where it is 
customary to denote by © the addition in the tropical semiring, ie. the minimum, 
and by the multiplication, the usual addition +, see [55] . 

2.1. Probenius in characteristic one. We recall here, from [10], [11], the behav- 
ior of the Frobcnius action in the characteristic one setting. 

Let be a commutative, characteristic one semifield. It is possible to work 
in the slightly more general case of multiplicatively cancellative semirings, but for 
simplicity we will forsake this generality. Recall that such a semifield is a set with 
two associative, commutative binary operations, {x,y) x + y and (x, y) xy 
such that the second distributes over the first, + x = x, Ox = 0, Ix = x, K 
has multiplicative inverses, and, importantly, the characteristic one condition that 
1 + 1 = 1. 

The first step in developing an analog of the Witt construction is to examine the 
Frobenius map in K . 

Lemma 2.1. (Frobenius) 

(2.1) (x + y)" = + ?/" for every neN. 

Proof. The proof is given in Lemma 4.3 of [lOj . but we recall it here for the conve- 
nience of the readers. One sees from the distributive property that, for every m G N, 
one has (x+j/)" = T,T=o ^'"v"'''' ■ This then gives (a;"+y")(a;+y)""^ = (x+y)'^"-^. 
Since K is multiplicatively cancellative, this implies ()2.1|) . □ 

2.2. Legendre transform. As shown in Lemma 4.2 of [10], K is endowed with 
a natural partial ordering ^ defined so that x^y^x + y = y. This may 
seem strange, but one sees that, over the tropical semifield, T this reads x ^ y <^ 
max(x, y) = y. We give K the order topology from ^. Then multiplication and the 
Frobenius automorphisms make K a topological M^o-module, since the Frobenius 
is continuous and distributes over the multiplicative structure. When K = T, this 
topology is the standard one on [0, 1) = M U {—00}. with the Frobenius acting by 
multiplication so that K has the normal vector space structure. 

We say that a function f : X ^ K, where X is a convex subset of a topological 
M^o-inodule, is convex if, for every t S [0, 1], xi,X2 G X, 

(2.2) fitxi + (1 - t)x2) ^ fixiYfix2)'^\ 
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with concavity being defined as convexity of the niuhiphcative inverse of /. 

Note again that, over T, this is the normal definition of convexity. 

We consider also 
(2.3) epi/ = {(«, r) e X X K\f{a) ^ r}, 

called the epigraph of /. This has the following property. 

Lemma 2.2. A function f is convex iff the epigraph epi/ is convex and f is closed 
iff epi/ is closed. 

Proof. The topological M^o-niodule structure an X x K \s given by the product 
structure, so the proof follows directly from the definitions. □ 

When X C R^o, we can define the Legendre transform of / by 

(2-4) nx) = Y.7r-v 

Note that over T this reads 

sup {ax — /(a)), 

which is the normal definition of the Legendre transform. 

When X Q K can define the Legendre transform of / by 



(2.5) /*(a) = 



W)' 



Proposition 2.3. The Legendre transform of f is closed and convex. 

Proof. Suppose first X C M^o- Let ga{x) = x°'/f{a), and g be the Legendre 
transform of /. Then g is the point- wise supremum among the ga, so epit; = 
r\aex "spif/a, an intersection of closed half spaces. Thus, epi 5 is closed and convex, 
so g is closed and convex, by the Lemma 12.21 The proof of the opposite case 
proceeds in precisely the same manner. □ 

One then has the following result on Legendre transforms. 

Theorem 2.4. (Fenchel-Moreau) Let f : X ^ K , X d M^q. Then the following 
hold. 

(1) /** is closed and convex and hounded by f . 

(2) /** = f iff f is closed and convex. 

Proof. The function /** is convex and closed by Lemma [2.21 We also see that 

x'^/f*{x)^x"l{x^/f{a)) = f{a), 

so taking a sup^gjj- of both sides yields /** ^ /. To prove the second fact, it 
suffices to show that, if / is closed, convex, and finite, then / ^ /**. Define the 
subdifferential df{a) of / at a by 

df{a) = {x e R\f{/3) ^ /(a)x^-"V/3 e X}. 

We consider the set-valued map a H- df{a). To invert this map is to find a{x) = a 
such that X € df{a). We see f*{x) = x°'''^^ / f {a{x)) . Thus, the subdifferential is 
the proper analog in this case for the derivative. When / is closed and convex, 
df{a) is nonempty, so let x G df{a). Then we have 

l/f*{x) ^ f{a)/x^ ^ f{a) ^ xyf*{x)) < /**(a) 
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for every a, proving the theorem. □ 

This is a simple translation of the well-known Legendre transform machinery into 
characteristic one semirings. The idea is that since we can define a real topological 
vector space structure on K using the multiplication as addition and the Frobenius 
map as scalar multiplication (with negative reals having a well-defined action since 
K has multiplicative inverses), we have enough structure to do convex analysis. 
The point is that for concave or convex /, the above sums are invertible in K. 
From now on, any semifield satisfying the assumptions necessary for this section 
will be called "suitably nice" . 

2.3. Witt ring construction in characteristic one. We recall here the main 
properties of the characteristic one analog of the Witt construction [10] , [11] , which 
is the starting point for our work. We formulate it here in terms of a general 
information measure S, whose properties we will find are related to the algebraic 
properties of the semiring. 

Let w : [0, 1] — > -R' be continuous under the order topology, and consider, for 
each x,y & K, 

(2.6) a;ffi„?/ = ^w(a)x"2/i-". 

Connes and Consani considered the above expression for continuous w{a) > 1 
and found in [10], [11] that (Bw is commutative, associative and has identity if 
and only if w{a) = p^^^"^ for some p G K greater than one. 

For simplicity and clarity of intention, we will write p = for some T ^ to 
suggest T behaves like a temperature parameter. In all the arguments that follow, 
one could replace by p again and be fine over the more general semifields. 

Correspondingly, we are going to restrict our attention to sums of the form 

(2.7) x ©s 2/ := ^ e™(")a;"yi-" 

where S will be interpreted as an entropy function. In particular, we assume 5* 
is concave and closed, so that e"^'^^"-' is convex and closed, and we can use the 
Legendre transform machinery developed in H2.2\ 

We can then formulate the result of [TT] on the characteristic one Witt construc- 
tion in the following way. 

Theorem 2.5. Suppose S* : / — >■ K^o concave and closed. The following hold. 

(1) x®sy = y®sxWx,yeK iff S{a) =5(1 -a). 

(2) Q®sx = x\lx <E K iff 5(0) = 0. 

(3) x®sQ = xyxizK iff 5(1) 0. 

(4) X ®s {y ®s z) = [x ®s y) ®s zyx,y,zeK iff 5(a/3) + (1 - a/3)5(^i%^) = 
S{a) + aS{p). 

Proof. The argument is given in jlOj in a more general form applicable to a binary 
operation as in (|1.2p , but we give the explicit proof here to show the machinery. 

(1) Since 5 is concave and closed, e~^^ is convex and closed (in the generalized 
sense of (|2.2p ). as is z^^") for any linear function L{a) and z £ K. We also see that 
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products of convex and closed functions are convex and closed, so ^■^(") and 

y<^-^g-TS{i-a) g^j-g gach convex and closed. We see that x (Bs V = U ®S x iff 

^ x°' 

2-^ ya~\p-TS{a) ^ ya-lg-TS(l-Q) ' 

We recognize the Legendre transform of closed convex functions, which is invertible 
by the Fenchcl-Moreau theorem above. Thus, the summands must be equal, so 
S{a) ~ S{\ ~ a). The converse is obvious. 

(2) First note that, when a 7^ 0, for every a;, 0"a;^~" ~ and e^'^'-^' > 0, so the 
supremum occurs at a = 0. Therefore, we have ®s x ~ e^^'^^^x. 

(3) Similarly, this supremum occurs at a = 1, so a; ®5 = e^'^'-^^x. 

(4) As in fact 1, we see x ©s [y ®s z) = {x ®s v) ©s z iff 

2^ „a(/3-l);^a-lg-T(S(a)+aS(/3)) ~ 2^ yv{u-l) ^(v-l)(l^u) g-~T{S{u) + {l-u)S{v)) ' 

Identifying powers and inverting the Legendre transform yields the condition. The 
converse is immediate. □ 

We hold off discussing the fact that the Shannon entropy Sh is the only function 
S satisfying all of these properties until ^below, where we develop the information 
theoretic interpretation of these axioms. 

3. Axioms for Entropy Functions 

It is well known that the Shannon entropy admits an axiomatic characterization 
in terms of the Khinchin axioms |26j . These are usually stated as follows for an 
information measure S{pi, . . . ,p„): 

(1) (Continuity) For any n G N, the function S{pi, . . . is continuous with 
respect to (pi, . . . in the simplex A„ = {p^ G R+, = 1}; 

(2) (Maximality) Given n G N and (pi, . . . ,p„) G A„, the function S'(pi, . . . ,p„) 
has its maximum at the uniform distribution pi = l/n for all i = 1, . . . , n, 

(3.1) S{pi,...,pn) < S{-,...,-), V(pi,...,p„) G A„; 

n n 

(3) (Additivity) If pi = X^J^iPy with pij > 0, then 

n 

(3.2) S{pii, . . . ,p„rn„) = S{pi,... ,Pn) + Y]piS'(— , . . . , ^!!!ii); 

(4) (Expandability) Embedding a simplex A„ as a face inside a simplex A„+i 
has no effect on the entropy, 

(3.3) S{pi, . . . ,Pn,0) S{pi, . . . ,pn). 

It is shown in [55] that there is a unique information measure 5'(pi, . . . ,p„) (up to 
a multiplicative constant C > 0) that satisfies these axioms and it is given by the 
Shannon entropy 

n 

(3.4) S{pi,...,p„) = Sh(pi,...,p„) := -C^p.logpi, 
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We focus now on the n ~ 2 case, which means that we are only looking at 
S(j)) := S{p, 1 — p) instead of the more general S(j)i, . . . ,Pn)- In other words, we 
are only considering the information theory of binary random variables. In this 
case, we describe here an axiomatic formulation for the Shannon entropy based on 
properties of binary "decision machines" . We return to discuss the more general 
ri-ary case in SJTO] below. 

A decision machine is a measurement tool which may only distinguish between 
two possible states of a discrete random variable; machines that can only answer 
"yes" or "no". We would like to measure the average change in uncertainty after 
a measurement, which is how we define the entropy associated with a random 
variable. Let X be a binary random variable, S{X) the change in entropy after 
measuring X. All information is created equal, so S{X) should only depend on the 
probability of measuring a certain value of X and should do so continuously. 



(1) (Left Identity) 5'(0) = 0. 

(2) (Right Identity) 5(1) = 0. 

(3) (Commutativity) S{p) = S{1 -p). 

(4) (Associativity) 5(pi) + (1 -pi)5(^) = 5(pi +P2) + (pi +P2)5(^). 



The identity axioms claim that trivial measurements give trivial information. 
The commutativity axiom claims that questions have the same information as 
their negative. 

The associativity axiom claims a certain equivalence of guessing strategies, which 
will be a key observation in our explanation of the characteristic one Witt construc- 
tion. If instead of a binary random variable, we want to measure a ternary random 
variable X which may take values X S {xi, 2:2, xs} with corresponding probabili- 
ties pi,P2,P3, we can still determine X by asking yes-or-no questions. We can first 
ask "is X = xi?^' If the answer is no (which occurs with probability p2 + Ps), 
we then ask "is X = 0:2?" This corresponds to an average change in uncertainty 
S{pi) + {p2 + P3)S{ p^^^p^ ). However, we could have asked "is X — xi or X2 ?" 
followed by "is X = xi?" and in the end received the same data about X. As- 
sociativity asserts these two should be equal, hence we have the axiom as stated 
above. 

The names of the axioms in the above list are chosen to suggest the corresponding 
algebraic properties, as we see in Theorem 14.21 below. In fact, we find that these 
algebraicly motivated axioms are equivalent to the Khinchin axioms. 

Theorem 3.1. There is a unique function (up to a multiplicative constant C > 0) 
satisfying all of the axioms above, namely the Shannon entropy 



Proof. The result follows either by checking directly the equivalence of the com- 
mutativity, identity and associativity axioms with the Khinchin axioms, or else by 
proceeding as in Theorem 5.3 of [TU]- We prove it here by showing that one obtains 
the Khinchin axioms for entropy. 

Suppose S satisfies all the conditions above. Define Sn ■ A„_i M^q by 



(3.5) 



sHp) 



C{plogp+{l-p) log(l-p)). 



(3.6) 




Lemma 3.2. Sn is symmetric. 
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Proof. Suppose we interchange the terms pk and pk+i, where k < n — 1. This only 
affects the terms kth and {k + l)th terms, so we must show 

T=il-J2p^)Si—^ ) + P^)SiT^^ — ) 

i<k 2^i<kP^ i 2^t<k+lP^ 

is symmetric. Write (3 = 1 — J2i<kPi' ~ Pk/l^, b ~ pk+i/P- We see (3 is invariant 
under this permutation, and 

T = /3 (Sia) + (1 - a)Sib/il - a))) . 

Permuting pk and Pk+i interchanges a and 6, and so T is invariant by the associa- 
tivity condition. Interchanging p„_i andp„ only affects the last term, and it is easy 
to see it affects it like S{a) i— ^ S{1 — a), so invariance follows from commutativity. 
These transpositions generate the symmetric group Sym„, so Sn is symmetric. □ 

From this lemma and the definition we see the following holds. 

Lemma 3.3. Let (Jfe)i^fc^m be a partition of {pi, . . . ,Pn} o,nd let Sn be defined as 
in (13.61). Then we have 



Sn{pi,.-.,Pn)=^ Srn{qi,-.-,qni)+ ^ S\j^\{Jk / qk) , 

l^fc^m 

where qk = J2peJkP' ^'^ Jk/qk is a \ Jk\-ary probability distribution. 

These lemmas take care of the third Khinchin axiom, and with the identity 
property also take care of the fourth. We assumed at the outset S was continuous, 
so it follows from the definition Sn is continuous, which is the first axiom. What 
remains is the second axiom, which we write here in terms of information (concave) 
rather than entropy (convex). 

Lemma 3.4. Sn is concave for all n. 

Proof. We proceed by induction on n. Wc have already assumed 5*2 = is concave, 
so suppose Sn is concave for some n > 2. Note that for continuous /, concavity 
follows from /(^^-j^) > /iiitill^. Thus we consider, for some (pi), (qt) G A„+i 

C, fPl , qi Pn+l , ^Ti+l. 

i'«+l(Y + y,...,^ + ^). 

By the previous lemma, this equals 

„ f Pl +P2 qi +92 P3 qs Pn+l qn+1 ^ 

^"^ 2 ^ 2 ' 2 ^ 2 2^2^ 
, Pi+P2+qi + g2 g,^ Pi+qi ^ 

2 Pi+P2 + qi + q2 

By the inductive hypothesis we then have 

S'„+l(...) >^S{pi +P2,---,Pn+l) + +92, ■■■,qn+l) 

Pi + P2 + qi + g2 Pi + qi ^ 

2 P1+P2 + qi + q2 

We see that 

Pi +91 Pi+ P2 Pi , 91 + 92 qi 



Pi+P2+qi+ 92 Pi + P2 + 91 + 92 Pi + P2 Pi + P2 + 91 + 92 9i + 92 
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and 

Pi +P2 I gi + g2 _ ^ 

P1+P2 + qi + 92 P1+P2 + qi+ q2 

so by the concavity of S we have 

Sn+l{---) >^S{pi +P2,---,Pn+l) + l^S{qi +q2, ■■■,qn+l) 

.pi±P2^iPi. gi + g2 qi ^ 

from which concavity of 5,1+1 fohows by the previous lemma. □ 

Since Sn is concave, it has a unique maximum, and since it is symmetric, this 
maximum occurs at S'„(i, i), implying the second Khinchin axiom. This then 
completes the proof of Theorem 13.11 □ 

A reformulation of the Khinchin axioms for Shannon entropy more similar to the 
commutativity, identity and associativity axioms considered here was described in 
Faddeev's [T7|. For different reformulations of the Khinchin axioms see also [13]. 

4. Thermodynamic semirings 

We now consider more general thermodynamic semirings. The following defini- 
tion describes the basic structure. 

Definition 4.1. A thermodynamic semiring structure over K , written like M"""'+U 
{00}, is a collection of binary operations ©s.T : KxK — !> K indexed by T £ RU{oo} 
and defined by an information measure, S" : [0, 1] — >■ K according to 

(4.1) x®s,Ty^ min {px + {1 - p)y ~ TS{p)). 

p6[o,i]nQ 

It is often convenient to consider the elements of the semiring as functions of T, 
with the operation ®s defined pointwise by (Bs,t- We call this ring R, inspired the 
p- typical Witt notation. Indeed in [11], [10], R is seen as the Witt ring over K, 
with evaluation at T = over giving the residue morphism R — > K. We then see 
that the Teichmiillcr lifts should be the constant functions, and T should play the 
role of the exponent of in considering field extensions. 

We then have the following general properties, as in Theorem l3 . 1 l above (Theorem 
5.2 of [10]): 

Theorem 4.2. Let x (Ss.t y be a thermodynamic semiring structure on a suitably 
nice characteristic one semifield, K, defined as in (|4.ip . Then the following holds. 

(1) X (Bs.T y = y ©s,T X iff S is commutative. 

(2) ©s,T X = X iff S has the left identity property. 

(3) X (Bs.T — X iff S has the right identity property. 

(4) X ®s,T {y ®s,T z) — {x ®s.T y) ®S,T z iff S is associative. 

Proof. The case of commutativity and of the identity axioms arc obvious. For 
associativity we have 

X ®s T {y ®s T z) = X ®s T min(pj/ + (1 - p)z — TS{p)) 

p 

= mhi{qx + {1 — q) mm{py + (1 — p)z — TS{p)) — TS{q)) 

q p 

= mm{qx + p{l - q)y + (1 - g)(l - p)z - T{S{q) + (1 - q)S{p))) 
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= min {pix+p2y + P3Z-T{S{pi) + {l-pi)S{ ))) 

Pl+P2+P3 = l i — Pi 

while 

{x ©s,T y) ®s,T = mm{px + (1 - p)y - TS{p)) ©s z 
p 

= mm{pqx + g(l - p)y + (1 - g)^ - T{qS{p) + ^(g)) 

Pi 

= min (pix +P2y - r(S'(pi +P2) + (Pi +P2)5'( ■ )). 

Pl+P2+P3 = l Pi + P2 

We see that the two ways of summing three quantities corresponds to the two ways 
of measuring a ternary random variable with decision machines. The equivalence 
is now obvious. □ 

Most information measures are commutative, though a non-commutative exam- 
ple in SJS] below. We discuss in 311 some physical reasons why commutativity is more 
automatic in this context than associativity. 

One then sees by direct inspection that, in the case of the Shannon entropy one 
has the following form of the thermodynamic semiring structure. 

Proposition 4.3. When S is the Shannon entropy, Sh, then 

(4.2) X ©sh,T y=-T log(e-"/^ + e-y/^) 
over R™'"'+ U {00}, while over K™^^'* it is 

(4.3) x(Bsb,Ty = ix^^^ + y'/^f- 

Notice that the semiring R™^'''* is isomorphic to the semiring R™'"'+ U {00}, 
under the — log mapping, so that (|4.3p is simply obtained from (|4.2p in this way. 

In this case, the parameter T corresponds to the parameter h of Maslov dequan- 
tization (see the comments in tjll.2p . The semifields obtained in this way are known 
as the Gibbs-Maslov semirings and the subtropical algebra (see [34], |31j). 

One can extend the notion of thermodynamic semiring to include a class of 
semirings of functions which we will be considering in the following. Just as in the 
case of a ring R and a parameter space X, one can endow the set of functions from 
S to i? with a ring structure, by pointwise operations, one can proceed similarly 
with a semiring. Moreover, in the case of a thermodynamic semiring structure, it is 
especially interesting to consider cases where the pointwise operation (Bs,t depends 
on the point in the parameter space through a varying entropy function S ~ Sr/, 
for r/ G S. 

Definition 4.4. Let 'E. be a compact Hausdorff space and let S = (5*^) be a family 
of information measures depending continuously on the parameter 77 G S. Let K ~ 
]^min,+ y jooj-. A thermodynamic semiring structure on the space of functions 
C{X,R) is given by the family of pointwise operations 

x{ri) es„,T yiv) = min (px{i]) + (1 - p)y{il) - TS,^{p)). 

p6[0,l]nQ 



The properties of Theorem 14.21 extend to this case. We will return to this more 
general setting in [JS] below. 

As we discuss in the following sections, more general entropy functions (which 
include the special cases of Renyi entropy, Tsallis entropy and Kullback-Leibler 
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divergence, as well as the more general categorical and operadic setting developed in 
§10|) give rise to thermodynamic algebraic structures that are neither commutative 
nor associative. Wc will continue to use the terminology "semiring" , although (as 
the referee pointed out to us) the term "algebra", in the sense of the theory of 
universal algebra, would be more appropriate. 

5. Statistical mechanics 

Before we move on to see explicit examples of thermodynamic semirings be- 
sides the original one based on the Shannon entropy considered already in [10] and 
we give in this section a physical interpretation of the algebraic structure of 
thermodynamic semirings in terms of statistical mechanics. This interpretation is 
a generalization of thermodynamic interpretations of max-plus formulas found in 

m- 

When K = K'°q^'*, we can write the thermodynamic semiring operations in the 
form 

x®nsy = max(p'^^P^a;P?/^"P). 
p 

In particular, when we set p = e'^^'^ , this reads 

max(e'''^"^'^'^^'''^ ^°s^+{i-p}iosvy 
p 

We recognize this as e^^<=i = Z, where Foq is the equilibrium value of the free 
energy of a system at temperature T, containing a gas of particles with chemical 
potentials logx and logy, and Hamiltonian 

(5.1) n^p\ogx + {l-p)\ogy, 

where p is now thought of as a mole fraction, and Z is its partition function. 

Indeed, the semirings R'^q'^'* and M'"'"'+ U {oo} are isomorphic by — log, and 
this gives 

logx ®s,kBT logy = min(ploga; + (1 - p) logy - ksTSip)), 
p 

which is the equilibrium free energy described above. We note also that the calcu- 
lated form of the thermodynamic semiring for Shannon entropy, that is 

x®shy = -Tlog(e-"/^ + e-y''^). 

In it, we recognize precisely the partition sum of a two state system with energies 
X and y. We thus consider members of the Witt ring R (see Q to be temperature 
dependent chemical potentials. 

In a gas system with a single type of particle, the free energy is precisely the 
chemical potential. The mixing of these gases gives a new free energy dependent on 
the entropy function. We then replace this mixture with a "particle" whose chemical 
potential is the equilibrium free energy per particle of the previous mixture. This 
gives a monoid structure on the space of chemical potentials. When we consider 
mixing in arbitrary thermodynamics, ie. with non-Boltzmann counting, we have 
the possibility of mixing to be non- associative. With this interpretation, however, 
we would not expect the mixing process to ever be non- commutative, so the lack 
of associativity has a more direct and natural physical interpretation than the lack 
of commutativity for thermodynamic semirings. We imagine multiplication to be a 
sort of bonding of gases, where chemical potentials add together. 
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We see the dynamics of this mixing process is determined, both physically and 
algebraicly, by the entropy function and the ambient temperature. At zero tempera- 
ture, the mixture is always entirely composed of the particle with the least chemical 
potential. This corresponds to and indeed evaluation at zero temperature 

gives us the residue morphism R ^ K . When the entropy function is the Shannon 
entropy, we get the normal thermodynamical mixing, see §8.5 of |16j . We can say, 
therefore, that the Witt construction is, in a sense, giving thermodynamics to this 
system. Note that in p.2p this construction is seen giving an inverse to Maslov 
dequantization, pointing out an interesting link between quantum mechanics and 
thermodynamics. 

The mixing entropy for chemical systems based on the Boltzmann-Gibbs statis- 
tical mechanics and the Shannon entropy function (as in §8.5 of |16j for instance) 
works well to describe systems that are ergodic. If a system is nonergodic (that 
is, time averages and phase space averages differ), then the counting involved in 
bringing two initially separated systems into contact will not follow the normal 
Boltzmann rules. As a result. Shannon entropy will not behave extensively in these 
systems. This typically occurs in physical systems with strong, long-range coupling 
and in systems with metastable states or exhibiting power law behavior. In such 
systems, maximizing the Shannon entropy functional (subject to the dynamical con- 
straints of the system) docs not produce the correct mctaequilibrium distribution, 
sec for instance [52] and other essays in the collection [21j . 

A broad field of noncxtensive statistical mechanics for such systems has been 
developed (see [52] for a brief introduction), where, under suitable conditions, one 
can calculate a "correct" entropy functional corresponding to the system at hand. 
These entropy functionals arc typically characterized by some axiomatic properties 
that describe their behavior. For instance, if we have two initially independent 
systems A, B and bring them together to form a combined system denoted A* B, 
one may require that S{A -k B) = S{A) + S{B) (extensive). This leads to forms of 
entropy such as the Renyi entropy |45j , generalizing the original Shannon case, while 
maintaining the extensivity over independent systems. One may also have explicit 
q-deformations of the extensivity condition, for example Sq{A -k B) = Sq{A) -\- 
Sq{B) + {1 — q)Sq{A)Sq{B) for independent systems. This leads to forms of entropy 
such as the Tsallis entropy [53| . 

When we consider different kinds of entropy functions in this way, we can look 
at the algebraic properties of the corresponding thermodynamic semirings. These 
will encode the information about the amount of noncxtensivity and nonergodicity 
of the system giving rise to the corresponding entropy function S. We can imagine 
non-associativity of mixing as a toy model of meta-equilibrium states where we 
known the entropy beforehand. We can also use thermodynamic semirings to encode 
relative entropies and analyze its behavior over a space of parameters through the 
algebraic properties of the semiring. 

Relations between idempotent semifields and statistical mechanics were also con- 
sidered in [23]. [25 ] . [43 ] . 
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6. The Renyi entropy 

Wc now look at other important examples of entropy functions and we investi- 
gate how the corresponding algebraic properties of the associated thermodynamic 
semiring detect the properties of the entropy function as an information measure. 

A first well known case of an entropy function which is a natural generalization 
of the Shannon entropy: the Renyi entropy, [45] . This is a one-parameter family 
Ry^ of information measures defined as 



(6.1) 



so that the limit 

(6.2) lim Ry„(pi, . . . ,p„) = Sh(pi, . . . ,p„) 

recovers the Shannon entropy. The Renyi entropy has a broad range of applications, 
especially in the analysis of multifractal systems [6], while a statistical mechanics 
based on the Renyi entropy is described in |29| . 

The Renyi entropy also has an axiomatic characterization, where one weakens 
the Khinchin additivity axioms to a form that only requires additivity of the infor- 
mation entropy for independent subsystems, while keeping the other three axioms 
unchanged, |46j . For our version of the axioms, formulated in terms of decision 
machines, this means that the associativity axiom no longer holds. 

Lemma 6.1. The lack of associativity of x ®s y, when S ~ Ry^ is the Renyi 
entropy 

(6.3) Ryjp) = log(K + [I-pTI 

1 ~ a 

is measured by the transformation {pi,P2,P3) {P3iP2,Pi)- 
Proof. We have 

Rya(pi) + (i-pi)RyJT^) = 
i-Pi 

^ ^iog(p? + (1 -p,r) + (1 -pi)iog (i-^r + i ^-p^'P' r 



1 — a \ \ 1 — pi I — pi 



1 — a 



log (p? + (l-pi)") 



((t^)" + (t^)"^ 



1 , f P1P2 ^Q , f P1P3 xa , a , a\ Pi i I P"^ \a , r P^ \a 

1 log + +P2+P3 -1 log + 

I- a \l-pi 1-pi J l~a VI- pi 1-pi 

1 , / (pg+Pg)(P? + (l-Pi)") A Pi , Ap^+pg) ^ 
1-a H (1-pi)" ) 1-a^^Ul-Pi) 

= 7^ ((1 - Pi) log(p^ + Pg) + log(p? + (1 - p^T) - a(l - pi) log(l - pi)) 

On the other hand, we have 

R-ya(pi +P2) + (pi +P2)Rya(— ^^^^ — ) = 

Pi +P2 

RyJi -P3) + (1 -P3)RyJy^) = Ryjpa) + (i -P3)RyjT^) = 

1 - P3 1 - P3 
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= 7^ ((1 - P3) l0g(p^ + p?) + log(p^? + (1 - P3)") - «(1 - P3) log(l - P3)) . 

1 — a 

So the failure of associativity is corrected by mapping (pi,P2,P3) > (P3 7^27^1)- In 
fact, this holds for any commutative S. 

In a commutative non-associative semiring K, the lack of associativity is cor- 
rected by the morphism 

K®K®K ^K®K®K 



K®K — — K ^ — K®K 

which makes the diagram commutative, and which is simply given by A{x®y®z) = 
z(g)y (g)x. This is exactly the transformation (PI7P27P3) ^ {p3,P2,Pi), as these 
correspond to pi = sr, p2 = s(l — r) and = 1 — {pi + P2) in the associativity 
constraints. Thus, the transformation {pi,p2,P3) ^ (^37^2, Pi) is exactly the one 
that identifies w(s)w(r)* with ■w{sr)w{s{l — r)/l — sr)^"*''. □ 

We will show in ij9] below that one can introduce a more refined notion of suc- 
cessor function for thermodynamic semirings, which encodes useful information on 
the algebraic structure of the semiring, including the lack of associativity, and on 
the thcrmodynamical properties of the entropy function. 

7. The Tsallis entropy 

The Tsallis entropy |53| is a well-studied generalization of Shannon entropy, 
currently finding application in the statistical mechanics of nonergodic systems, 
[2T] . It is defined by 

(7.1) Ts«(p) = ^(i„p"_(i„p)«). 

(A slightly more general form will be analyzed in fjTT] below, see (|7.2p .) 

The basic characterizing feature of the Tsallis entropy is the fact that the ex- 
tensive property (additivity on independent subsystems) typical of the Shannon 
and Renyi entropies is replaced by a nonextensive behavior. This corresponds, 
algebraically, to replacing an exponential function (or a logarithm) with an a- 
deformed exponential (or logarithm), see §2.1 of [52], so that the usual Boltzmann 
principle S ~ klogW of statistical mechanics is replaced by its deformed version 
Sa = klog^W, where log„(a::) = {x^~" — 1)(1 — a). Thus, instead of additivity 
S{A-kB) = S{A) + S{B) on the combination of independence systems, one obtains 
Sa{A-k B) = Sa{A) + Sa{B) + (1 — a) Sa{A) SaiB) . An axiomatic characterization 
of the Tsalhs entropy is described in [50], [35], and [55]. 

We consider the thermodynamic semiring as in Definition 14. 1 1 with the informa- 
tion measure S given by the Tsallis entropy S = Ts^. 

In this case the failure of the associativity condition for the semiring with the 
®s.T operation is measured by comparing the expressions 

Ts„(pi) + (1 -pi)Ts„(-^) = 
1 -Pi 
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and 



TSq(pi +P2) + {Pl +P2)TSq( — — ) = 

Pi +P2 

^ -{i-iPi+P2r-ii-Pi-P2r+ ^" ' 



However, an interesting feature of the Tsallis entropy is that the associativity 
of the thermodynamic semiring can be restored by a deformation of the operation 
(Bs,T, depending on the deformation parameter a which makes sense in the previous 
thermodynamic context, so that the Tsahis entropy becomes the unique function 
that makes the resuhing ©s,t,q both commutative and associative. 

7.1. A Witt construction for Tsallis entropy. Wc show here how to deform 
the thermodynamic semiring structure in a one-parameter family (Bs.T.a for which 
S = Tsa is the only entropy function that satisfies the associativity constraint, 
along with the commutativity and unity axioms. 

We consider here a slightly more general form of the Tsallis entropy, as the 
non-associative information measure the Tsallis entropy |21| . defined by 

(7.2) Ts„b) = -i-(K + (l-pr-l), 

(j>{a) 

where a G M is a parameter and is a continuous function such that (f>(a){l—a) > 0, 
whenever a ^ 1, with 

lim (t>{a) ~ 0, 

and such that there exists ^ a < 1 < 6 with the property that is diffcrentiable 
on (a, 1) U (1, 6), and 

hm < 0. 

a-i-i da 

Note that this implies that the Tsallis entropy reproduces the Shannon entropy 
in the a — !■ 1 limit. A typical choice for the normalization is 0(a) = 1 — a, which 
reproduces the form (|7.ip . 

Here we work with the more general form (|7.2I) . as wc will be able to ensure 
uniqueness only up to a general (f) satisfying the above requirements. 

We find that the Tsallis entropy fits nicely into the context of Witt rings with 
the following two results. 

Theorem 7.1. The Tsallis entropy in the form (j7.2p is the unique entropy func- 
tion that is commutative, has the identity property, and satisfies the a -associativity 
condition 

(7.3) Sip,) + (1 - p.rsi-^) = s{p, + P2) + [pi +P2rs{-^). 

1 - Pl Pl + P2 

Proof. We assume a priori that —5* is concave and continuous. Therefore, —S has a 
unique maximum, which is positive when S is non-trivial, since S'(O) = 0. Moreover, 
S is symmetric, so this maximum must occur at p = 1/2. S also has the identity 
property and the a-associativity, so by Suyari |49j and Furuichi |20| . this implies 
5" = Tsq,, for some (/)(q!) satisfying the above properties. The converse follows from 
direct application of the arguments given in [5D] and [IH] and is easily verified. □ 
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The a-associativity condition as one of the characterizing properties for the Tsal- 
Hs entropy was also discussed in [14] . 

We can interpret this a-associativity as an associativity of an a-deformed Witt 
operation as follows. Fix some a and consider 



(7.4) a; ©s,T,a 2/ = ^ "^"'x" y 



s6/ 



We then have the following characterization of associativity. 

Theorem 7.2. For a =^ 0, the operation (Bs.T.a "is associative if and only if S is 
a-associative, as in (|7.3p . 

Proof. Wc find that this operation is associative if and only if 



^ pT(S{.sr) + (l-sr)°S(iii^))^(6r)°,,(s(l-r))°^(l-r)° 



s,rel 

We make the same subsitution as earlier, setting pi = sr, p2 = .s(l — r), p3 ~ 1 — r. 
Then the above condition becomes 

gT(S(pi) + (l-pi)"S(T^))^pJ^pJ^pJ 

Pl+P2+P3 = l 

^ gT(S(pi+p2)+(pi+P2)°S(5^))^p?yPf^pf _ 

Pl+P2+P3 = l 

When a ^ 0, the map a a" is invertible and convex/ concave, and the above is 
a composition of this map with several Legendre transformations, so we can invert 
this composition to obtain 

s{p,) + (1 -p,rs{-^) = +P2) + bi +P2rs{^^), 
1 - pi P1+P2 

which is exactly the a-associativity condition. □ 



It is worth pointing out at this point that in the above deformed Witt construc- 
tion, we have replaced the energy functional 

with 

according to our interpretation in [5] In the setting of noncxtensive statistical me- 
chanics built upon the Tsallis entropy, this latter expression is exactly the energy 
functional used. Therefore, the deformed Witt addition is again naturally inter- 
preted as a free energy, now in the more general g-deformed thermodynamics. 
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8. The Kullback-Leibler divergence 

Wc now discuss another class of thermodynamic semirings in which both the 
associativity and the commutativity properties fail, but in which we can encode 
entropy functions varying over some underlying space or manifold. In particular, 
we will connect the thermodynamic semirings we consider in this section to the 
general point of view of information geometry, as developed in [3] , |22j . 

The KuUback-Liebler divergence \27\ , [28) is a measure of relative entropy, mea- 
sured by the average logarithmic difference between two probability distributions p 
and q. Since the averaging is done with respect to one of the probability distribu- 
tions, the KL divergence is not a symmetric function of p and q. 

More precisely, the KL divergence of two binary probability distributions p and 
q is defined as 

(8.1) KL(p;g)=plog^ + (l-p)logi^. 

q 1-q 

The negative of the KuUback-Liebler divergence reduces to the Shannon entropy 
(up to a constant) in the case where g is a uniform distribution. It is also called 
the information gain, in the sense that it measures the probability law p relative 
to a given input or reference probability q. 

We are especially interested here in considering the case where the probabil- 
ity distribution q depends on an underlying space of parameter, continuously or 
smoothly. Mostly, we will be considering the following two cases. 

Definition 8.1. A smooth univariate binary statistical n-manifold Q is a set of 
binary probability distributions Q ~ {q{il)) smoothly parametrized by rj Cz M". 

A topological univariate binary statistical n-space Q is a set of binary probability 
distributions Q = {q{r])) continuously parameterized by rj € 'E., with S a compact 
Hausdorff topological space. 

The first case leads to the setting of information geometry [3], HH, while the 
second case is more suitable for treating multifractal systems [5]. 

We then consider thermodynamic semiring in the more general form of Definition 
14.41 Let X be either a compact subset of R" in the case of a smooth univariate 
binary statistical manifold or a closed subset of a compact Hausdorff space S in the 
topological case of Definition 18.11 We consider the space of continuous functions 
TZ = C{X,R), where the semiring K is either U {oo} or R™^^'*, or in the 

smooth case we take TZ = C°°{X, K). 

Give q = q{r]) in Q, we can endow the space TZ of functions with a thermodynamic 
semiring structure as in Definition 14. 4[ where the deformed addition operation is 
given by 

(8.2) x(77)®KL,,py(r/)= p-'^'-^^'-'^^^^^i^ryi^f-", 

pGQn[0,l] 

where p is the parameter of the deformation. Note we use the negative of the KL 
divergence because we are interested in it as a measure of relative entropy, rather 
than relative information, concepts often conceptually distinct but always related 
by a minus sign. 
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In the case when q{T]) = 1/2 is uniform for ah rj, we obtain back the original case 
with the Shannon entropy up to a shift factor 

P 1 — P 

a;eKL,,p y\q{r,)=i/2 = max(-p(plog ^-^ + (1 - P)^og^j^) +px + (1 - p)y) 

(8.3) = max(pSh(p) + px + {1 — p)y) + p log 2. 

p 

We note that we can calculate this operation explicitly over R™'"'+ U {oo} and 
j^max,*^ We obtain the following result, by arguing as in Proposition 14.31 

Proposition 8.2. We have the following expression over ]R™'"'+ U {oo} 



and the following expression over ro™^'*^'* 



a; ©KL 2/ = -Tlog(e iT +e (^-in) 

ma>; 
>0 



q l~q 

The first observation then is that the additive structures (|8.2p are in general not 
commutative. 



(8.4) x®KLy^ P """'""'x'-y^ 



Proposition 8.3. The thermodynamic semiring structure 

^.,-KL(p;(j)^p,,l-2: 

peQn[o,i] 

is commutative if and only if q = 1/2. The lack of commutativity is measured by 
the transformation q ^ 1 — q. 

Proof. This is immediate from the previous calculation, but we perform the proof 
over general K. We find 

KL{1 ~p;q) = {1 - p) log- — ^ +plog--^. 

q 1- q 

This is related to KL(p; q) by the transformation q ^ 1^9- Thus, KL(p; q) ~ 
KL(1 ~ p;q) when log — ^ = 0, that is, when q — 1/2. This is exactly when 
the Shannon entropy case is reproduced, so the only case when the addition (|8.4p 
based on the KuUback-Liebler divergence is commutative is when it agrees with 
the Shannon entropy up to a shift factor. □ 

For the associativity condition we find the following result. 

Proposition 8.4. The lack of associativity of the thermodynamic semiring ()8.4|) 
is measured by the transformation {pi,P2,P3',q) '-^ {P3tP2tPu 1 ^ ?)• 

Proof. Again we proceed over general K. We have 

KL(pi;<z) + (l-pi)KL(-^;g) 

= Pi log h (1 - Pi) log 

q 1-q 

, 1 ^2 , s , 1-P1-P2 

+P2 log — — + (1 - pi - P2) log ■ 



il-pi)q ' " "(l-pi)(l-g) 

= Pi log h (1 - Pi) log 

q 1-q 



while 
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+P3 log J— ^ + P2 log — - (1 - pi) log(l - pi), 

KL(pi +P2;q) + (pi +P2)KL( ;g) 

. , ^ , Pi +P2 , ^ , 1 -Pl -P2 

= (Pl +P2 log h 1 -Pl -P2)l0g 

q 1-q 

+ (Pl +P2) ■ log 7 ■ ^ + (Pl +P2 ■ log-^ ■ ry- r 

P1+P2 (Pi+P2)'7 P1+P2 (pi+P2)(l-9) 

f , ^, Pl +P2 , X, 1 - Pl -P2 

= (Pl + P2) log h (1 - Pl - P2) log — 

q 1-9 
Pl P2 

+Pl log h P2 log (pi + P2) log(pi + P2) 

q l~q 

= P3 log hi- P3 log 

1-q q 

+Pi log ^+P2 log - (1 - Pa) log(l - Pa)- 
These are related by the transformation (pi,P2,P3; <?) ^ (P3jP2,Pi; 1 — <?)• D 



Notice that, because of the presence of the shift in (|8.3p with respect to the 
Shannon entropy, in the case g = 1/2 we find 

KL(pi;i) + (l-pi)KL(-^;i) = 
2 1 — Pl 2 



while 



Pl logpi +P2logP2 +P3 logp3 + log2 + (1 -pi)log2 



KL(pi +P2; h + (Pi +P2)KL(^^; 1) 
2 Pl + p2 2 



Pl log Pl + P2 l0gP2 + P3 l0gP3 + log 2 + (1 - P3) log 2. 

Thus, associativity is not automatically obtained in the uniform distribution 
case, but instead we have associativity up to a shift. 

By Proposition 18.41 we see that, in the case of a thermodynamic semiring TZ = 
C{X, K) oiTZ = C°°{X, K), for a topological or smooth univariate binary statistical 
space, if one can find an involution a : A" — > A" of the parameter space such that 
q{a{ri)) = 1 — (7(77), then one can consider the transformation x{r]) t-^ x{a{ri)) and 
one finds 

xiv) ©KL,(^) yiv) = 2/(a(r/)) ffiKL,(„(„„ x{a{r])). 
Moreover, the morphism 

A : {x{T]),y{r]),z{i])) ^ {z{a{iq)),y{a{-q)),x{a{ri))) 

measures the lack of associativity, by making the diagram commute, 

n^n^n ^n®n®n 



S)l 



1®©KL 
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8.1. Applications to multifractal systems. Consider the case of a Cantor set 
X identified, througli its symbolic dynamics interpretation, as the one sided full 
shift space on the alphabet {0, 1}, see §1.3 of }42| . 

For ?7 6 A", let a„(7y) denote the number of I's that appear in the first n digits 
7^1, ... , iin of We set 

(8.5) q{rj) - hm 

if this limit exists. We denote hy y C X the set of points for which the limit (|8.5p 
exists. 

The limit (|8.5|) determines several important dynamical properties related to 
the fractal geometry of X. For example, suppose that X is a uniform Cantor 
set obtained from a contraction map / with contraction ratio A, endowed with a 
Bernoulli measure fip for a given < p < 1, defined by assigning measure 

fipiXiw,, . . . , u;„)) = p''"(-)(l - 

to the cylinder sets 

X{wi,. . . , w„) ^ {ri G X \ rii = Wi, i = I, . . . ,n}. 
Then, the local dimension of <Y at a point 77 e 3^ is given by (§4.17 of j42) l 

, 9(??)logP+(l-q(?]))log(l-p) 

'^-''(''^ = biA 

while the local entropy of the map / is given by (§4.18 of [12]) 

h^^jiv) qiv) logp + (1 - qin)) iog(i - p). 

For a non-uniform Cantor set X with two contraction ratios Ai and A2 on the two 
intervals, the Lyapunov exponent of / is given by (§4.20 of [42j ) 

^fiv) = 9('7)logAi + (1 - g(?/))logA2. 

One knows that, given a Bernoulli measure on the Cantor set X, there is a 
set Z C X of full measure fJ,p{Z) = 1, for which q{r]) = p (Proposition 4.5 of 22). 
The choice of the uniform measure /ii/2 yields a full measure subset Z1/2 on which 
the limit (7(77) = 1/2 is the uniform distribution (the fair coin case). In general one 
can stratify the set y C X into level sets of (7(77). This provides a decomposition of 
the Cantor set as a multifractal. 

Looking at this setting from the point of view of thermodynamic semirings sug- 
gests considering the set of functions C{y, K) endowed with the pointwise operation 
©KL,(^),T, with the KuUback-Leibler divergence KL(p; (7(77)), for q[r]) defined as in 
(|8.5[) . Then we see that, without the need to choose a measure on A", the alge- 
braic properties of the thermodynamic semiring automatically select the "fair coin 
subfractal" Zi/2- 

Proposition 8.5. For Z C y, the semiring C{Z,K), with the operation ©kl^j^j.T; 
for q{ri) as in (|8.5[) . is commutative if and only if Z C 2^i/2 *s a "fair coin" subset. 



Proof. It follows immediately from Proposition 18.31 □ 

Moreover, we can see geometrically the involution that measures the lack of 
commutativity as in Proposition 18. 31 and the lack of associativity as in Proposition 
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Proposition 8.6. The homeomorphism 7 : A" — > A" given by the involution that 
exchanges ■<-> 1 m the digits of rj in the shift space implements the involution 
q{i]) t-> 1 — q{vi) that measures the lack of commutativity and that, together with the 
involution (pi,P2,P3) '-^ iP3iP2,Pi) o,lso measures the lack of associativity. Thus, 
the morphism x{ri) > x('j(r])) restores commutativity, in the sense that 



while A : TZi 



KL„ 



^n^n^n^TL^n given by 

A{x{ri),y{iq),z{iq)) = {z{'y{T])),y{j{r])), x{j{r]))) 
restores associativity, making the diagram commute 

A 



■7^■ 



■7^(8)7^ 



Proof. This follows immediately from Proposition 18.31 and Proposition I8.4[ by ob- 
serving that the (7(77) defined as in (jS.Sp satisfies q{j{r])) = l — q{rj), since an{j{i])) = 
n — an{r]) for all rj & X. □ 



8.2. Multivariate binary statistical manifolds. We see that in the univariate 
case, the extremal p value is the unique probability distribution minimizing the KL- 
divergence to q subject to the soft constraint coming from the energy functional 
px + {1 — p)y, see [JS] This is important because minimizing the KL divergence 
is maximizing likelihood, and this plays an important role in marginal estimation, 
belief propagation, mutual information calculation, see [32] and [3]. 

A more interesting case is that of multivariate statistical manifolds. To maintain 
the same features as in the univariate case, we will find that a hyperring structure is 
most natural. See [54], for an introduction and relevent facts of hyperstructures. 
We first note the following fact. 

Proposition 8.7. If p and q are two distributions, we denote by pi and qi their 
i-th marginal distribution. Then KL(p; q) = KL(pi; qi). 



Proof. We have 



KL(p;g') =pi ■ ■ -pnlog- 



+ (1 -Pl)p2 ■ ■ -Pnlog 



qi---qn 

(1 -Pl)P2 ---Pn 



+ ---+(l-pi)---(l-p„)lo: 



(1 - (71)92 ■■■qn 

(1-Pl)---(1-P„) 



(1 - gi) • • • (1 - qn) 



Pi-- -Pn{log — H hlog — )H h(l-pi) ■ • • (l-p„)(log ^ — — + ■ ■ ■+\og^ —) 

qi qn 1-91 1 - 9n 

Pi 

=Pl log — (P2 ■ • -Pn + (1 - P2) ■ • -Pn H ) 

9l 

1 — pi 

+ (1 -pi)l0g- (p2 ■ • -Pn H ) H 

t - 9i 
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= Pi log —{{I+P2 -P2)[PZ • • -Pn H )) H 

= Pi log — + (1 - Pi) log - — — H h (1 - Pn) log - — — = V KL{p^]qi). 

qi — qi — qn ^ 

□ 

Thus, if we can ensure that the sum of the KL divergences of the marginal 
distributions is minimized, then the total KL divergence will be minimized. 

8.3. Product of semirings and hyperfield structure. We proceed by taking 
the semiring 

7e = C({l,...,n},AO = i^®". 
It is tempting to define the operations on 7?. coordinate-wise, however, since we 
want to consider an n-ary probability distribution and not n binary probability 
distributions, there should be some dependence between coordinates that takes 
advantage of the previous proposition. In short, we would like to put an ordering 
on Ti, that ensures the trace 

(Xi, ...,X„) Xx + ... Xn^K 

is maximized. This ordering does not uniquely determine a maximum between 
two tuples. We thus forsake well-definedness of the addition on K and define 
(cci, ccn) + to be the set of tuples (zi,...,2;„) with Zi = Xi or yi that 

maximize z\ + ... + Zn in the ordering on K. This, together with coordinate-wise 
multiplication defines a characteristic one hyperfield structure on TZ. We then define 
the Witt operation for some information measures 5*1, Sn over K = M"""'+U{cxd} 

by 

x®Si,...,Sr.y = min {piXi + {l-pi)yi-TSi{pi), ...,p„a;„ + (1 -p„)y„ - rS'„(p„)), 

where x = (xi, ...,x„),y = (yi, ...,y„), now we consider the pi as marginal proba- 
bilities, and the min operation is the multivalued hyperring addition. When each 
Si is the KL-divergence from some qi, by the previous proposition, the results of 
this operation are exactly the distributions with marginal probabilities {pi, ...,Pn) 
minimizing the KL-divergence to the marginal probabilities subject to 

the soft constraint coming from the energy functional 

U = ^PiX^ + (1 -Pt)yi. 

The lack of well-definedness of this addition can be interpreted in the thermo- 
dynamic context as the non-uniqueness of equilibria, via the existence of meta- 
equilibrium states. Indeed, when the qi describe a uniform distribution, we find 
that this addition is in fact well-defined. 

Note that these hyperfields are slightly different from those considered in [54]. 
However, just as taking T for the Shannon entropy semiring reproduces the 
"dequantized" tropical semiring, we can take T <— for the KL divergence semir- 
ing to get a "dequantized" tropical hyperfield. This reproduces the undcrformed 
addition defined on K above. Note that this is not the same as Oleg Viro's tropical 
hyperfield discussed in |54| . 

We can encode more information about a space in the ring deformation by re- 
stricting the marginal probabilities we sum over, in particular we can restrict the 
minimizing process to certain submanifolds of our probability manifold such as the 
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e-flat or m-flat manifolds typically considered in [3], since the KL-divergence is re- 
lated to the Fisher information matrix defining the Riemannian structure. See also 
the comments in Ull.ll below. 

9. The successor in thermodynamic semirings 

Given a thermodynamic semiring in the sense of Definition 14.11 we let 

(9.1) X{x,T)=x(BsO = mm{px-TS{p)). 

p 

Then X : K x R ^ K is the Legendre transform of TS : [0, 1] M. If we assume 
that S has a unique maximum, then we can invert the Legendre transform, so that 

TS(j)) = min(p.T — X{x, T)). 

X 

Therefore, when S is concave/convex, we can recover it from the semiring. We call 
A the successor function, since is the multiplicative identity, and over general K 
we can write X{x,T) = x (Bs 1- When multiplication distributes over addition, we 
can write 

X ®s y = X{x - y,T) + y. 
We will tend to suppress the T dependence of A. Each of the algebraic properties 
of S and K translate into the language of A. 

Proposition 9.1. The entropy function S has the following properties. 

(1) It satisfies the commutativity axiom S{p) = S{1 — p) (hence (Bs,T is com- 
mutative ) if and only if 

(9.2) A(x) - A(-a;) = X. 

(2) It satisfies the left identity axiom S'(O) = (hence ®s has left identity oo) 
if and only if X{x) ^ and lim^^-^oo ^i^) ~ 0. 

(3) It satisfies the right identity axiom S{1) = (hence ©s has left identity oo) 
if and only if X{x) ^ x and X{x) x, as x —oo. 

(4) It satisfies the associativity constraint making (Bs associative iff 

X{x-X{y)) + X{y) = X{X{x-y)+y). 

Proof. Facts (1) and (4) are immediate from the definition. The properties (2) and 
(3) arise from the fact that A should be continuous at oo and — oo. We then read 
oo©5a; and x©soo as limy_>oo y®sx and limy^oo x(Bsy, respectively. Each of these 
should equal x, and in terms of A we sec that limj,_j.oo y ©5 x = limj,_s.oo X{y — x) + x 
and limy^oo x Os,t V = finiy^oo X{x — y) + y, thus proving (2) and (3). □ 

In the case of the Shannon entropy S = Sh and KL-divergence S = — KL(p; g), 
we have the following forms for the successor function. 

Proposition 9.2. For Shannon entropy, 

(9.3) AS'^(x,T) = -Tlog(l + e-^/^), 
ODer M'"'"'+ U {00}, and 

(9.4) X^''{x,T) = {l + x^/^f 
ower R^q'^'^. For the KL-divergence, 

(9.5) A^L(x,r) = -riog(l + e^"/«^), 
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over M"""'+ U {oo}, and 

(9.6) A^L(x, r) = (1/(1 + (../,) V^f, 

Mmax,+ 
>0 

Proof. This follows directly from the definition of X{x,T) ~ x ®s 0, and the iso- 
morphism — log relating the semirings M™q'''* and R'"'"'+ U {oo}. □ 

Figures [U [2] and |3] show examples of a plot of A^'' plotted versus x, for different 
values of T. 




Figure 1. The successor function A^^ for T = 0.5 




Figure 2. The successor function A^^ for T = 1 

9.1. Successor function for Tsallis entropy. Consider now the case of the 
Tsallis entropy 

Ts„(p) = -^(p" + (l-p)"-l). 
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Figure 3. The successor function A^^ for T = 2 

Proposition 9.3. The successor function X^^^^XjT) for the Tsallis entropy is 
given by 

r It^i<x/t 

(9.7) AT^°(a;,T)=J gix) -|^| < x/T < |^| 

[x a:/T<-\^\ 
where g{x) is given by applying Ts to the inverse of its derivative. 
Proof. We have 



dTs 
dp 



1 - a 



We see the derivative of Tsq, has range [— | I j-r^jl], so that we obtain (|9.7p . □ 

Figure|3]shows an example of a plot of A'^'^" plotted versus x. In the limit a oo, 
one has Ts^ — X(o,i]i so indeed X^^°°{x) = a;X[-oo,o)(a;) for finite temperature. 
When a < 0, Tsq, is convex, so A becomes concave in this region, as expected. 



9.2. Successor function for Renyi entropy. We now consider again the Reyni 
entropy given by 

Ryab) = T^iog(K + (i-pn. 

1 — a 

We have 



dRy 



(p"-^ + (l-p)"-^)/(p" + (l-p)"). 



dp 1 — Q' ' 

This time, however, the derivative has range R, so that we have both A^^° (x) < x 
and X^^" (x) < 0. 

Figures [S] and [5] show examples of a plot of A^^" plotted versus x, for different 
values of a. 

One can see, by comparing these various graphs for the different entropy func- 
tions that increasing T has the effect of smoothing the transition, while increasing 
a sharpens it. 
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Figure 4. The successor function A„ for a = 0.5 and T = 1 




Figure 5. The successor function A^^^" for a = 0.1 and T — 1 

9.3. Cumulants generating function. In this section we give a thermodynami- 
cal interpretation of the successor function. 

Recah that, for a random variable X, if Mx(t) denotes the generating function 
for the momenta of X, 

°° t"' 
Mx{t) = (exp(iX)) = V — , 
^ — ' to! 

m=0 

then the cumulants of X are defined as the coefficients of the power series 

expansion of the function log Mx [t) , 

logMx(i) = y"«^«-r- 
^ — ' n\ 

n=0 



The information contained in cumulants or momenta is equivalent, though cumu- 
lants have the advantage that they behave additively over independent variables. 
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Figure 6. The successor function A'^*'" for a = 0.9 and T = 1 



We then have the following result. We consider the case of an analytic A, which 
is reasonable when attempting to gain a thermodynamic understanding, as the 
microscopic dynamics are usually assumed to be analytic. 

Proposition 9.4. Let X{x,T) be the successor function of a thermodynamic semir- 
ing K. Assume that X{x,T) is analytic. Then the function — A(x,T)/T is the cu- 
mulant generating function of the probability distribution for the energy E , in the 
variable — 1/T = — /3. Namely, if we write the nth cumulant as k„ = {E")c, we 
have 

(9.8) (-ir+i^(/3A(x,T)) = (i?"),. 

Proof. In thermodynamics, Z{f5) ~ (exp(— is the partition function, where 
/? = 1/T is the inverse temperature and E is the energy. The Helmholtz free 
energy is then given by 

(9.9) F = -Tlog{cxp{- E/T)). 

Up to a factor of —1/T, the Helmholtz free energy is in fact the cumulant generating 
function for the random variable given by the energy E. As observed already in 
fjni the Helmholtz free energy is the Legendre transform of the entropy, and can 
therefore we identified, again up to a factor of —1/T, with the function A(a;, T). □ 

We can of course perform this proof without reference to the thermodynamics. 
That is to say: the Legendre transform structure of the whole ordeal is independent 
of the information measure we select as long as we select one which is concave and 
analytic. In particular, from (|9.8p we have 

Xix, T) - T-^XiT, x) - {E) = peqx, 

where Pcq = Pt {x) is equilibrium value of the mole fraction. We know that A(x, T) ^ 
uux\p{px ~ TS{p)) = pt{x) — TS{pt{x)). We see that pt{x) satisfies 

xIT=^S{pt{x)), 
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SO we can write pt{x) = p{x/T) and X{T,x) = \{x/T). Notice that this explains 
the effect that changing the temperature has on ®s,t- 
From the definition, we calculate 

^Kx/T) = x^pix/T) - Sip{x/T)) T-^pix/T)-^Sipix/T)), 

which, by the above property, is just -~S{p{x/T)), proving the above relation. Note 
this holds for arbitrary smooth, concave entropy functions. Similarly, we calculate 

^A(a;/r) = xp{x/T), 

so that 

X{x/T)^x-^X{x/T)+T-^X{x/T). 
This is a well-known property of the Legendre transform of smooth functions. 

10. Entropy Operad 

A categorical and operadic point of view on convex spaces and entropy functions 
was recently proposed in [3] , [5] , [TH] , [H] . Here we will use a similar viewpoint to 
describe generalized associativity conditions on thermodynamic semirings. 

More precisely, we consider here the more general question of how binary (or 
more complicated) information measures can be built up to ones for n-ary random 
variables for any n ^ 2. This will give us some interesting correspondences between 
the combinatorics of such "guessing games" and generalized associativity conditions 
in an operad with n-ary operations defined over K like Xi ©5 • • • (Bs Xn with some 
choice of parenthesizing. In this section, we will assume for simplicity that K is 
M"""-+ U {00}, although, once again, this is only a notational convention chosen 
to elucidate certain expressions. All the statements made here could be translated 
into the greater generality for real characteristic one semifields. 

Operads were first introduced in [32] in the theory of iterated loop spaces and 
have since seen a broad range of applications in algebra, topology, and geometry. 
We recall briefly some basic facts about operads that we will need later, see [55] . 

An operad is a collection of objects from a symmetric monoidal category S with 
product (g) and unit object k. In particular, for each j G N, we have an object C(j), 
thought of as parameter objects for j-ary operations, with actions by the symmetric 
group Synij . thought of as permuting inputs. We also have a unit map 77 : k — >■ C(l) 
and composition maps 

7 : C{k) ® C(ji) • ■ • ® C(jfe) C{ji + ---+jk) 

which are suitably associative, unital, and equivariant under the action of Sym^, 
such that if (7 S Sym^,, 

7(cfc «) c^(j^) (g) • • • c^(j^_)) = j{(j{ck) (g) • • • ® CjJ. 

A C-algebra A is an object together with Sym^ -equivariant maps 

C{j) ® A' A, 

thought of as actions, which are suitably associative and unital. Here A^ represents 
A®-' and A'-* = k. An ^-module M is an object together with Symj_]^ -equivariant 
maps 

C{j) (g) A^-'^ (g) M M 
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which are also suitably associative and unital. Note that we are taking our objects 
all from symmetric monoidal categories, so we do not need to distinguish where 
the operad lives from where the algebras live, but we have not eliminated the 
possibility of doing so. When we consider the entropy operad, the n-ary operations 
of the operad will be parametrized by rooted trees, while we will take algebras from 
the category of topological categories. 

10.1. Operads and entropy. We first recall the recent construction of J. Baez, 
T. Fritz, and T. Leinster, [4j, [5] of an operadic formalism for entropy, which is 
especially relevant to our setting and nicely displays the basic machinery. 

Using the set theorists' convention, we define natural numbers as n = {0, . . . ,n — 
1}. An ordered n-tuple will then be denoted as (ai)ig„ = (oq, . . . , a„_i). Consider 
as our symmetric monoidal category the category of topological categories, denoted 
Cat(Top), with CE) as the Cartesian product, and k as the one-point space. One can 
construct an operad, V, out of probability distributions on finite sets. For each 
J, we define V{j) as the set of j-ary probability distributions, thought of as the 
(j — l)-simplex, Aj_i C M-', and given the subspacc topology. If (pi)igj G 
and for i E {1,... , j}, {qii)ieki e P{ki), we let 

7((Pi)»Gj ® (9iz)/6feo 'X' ■ • ■ (8) {q]i)iek,-i) iPtqti)ieki.ie] e C(fco H h fcj-i). 

Basically, this says that, given a binary variable X G {xi)ii^j with probability 
distribution {pi)i^j, we refine the possible values of X, splitting up each Xi. 

As a heuristic description of this procedure, imagine we are measuring physical 
systems and have suddenly discovered how to measure spin or some other quantity 
that we were ignorant of before. Now there are more distinguishable states that 
we can measure. We know the probability distribution of these new states given 
an old state Xi'. it is {qii)i^ki, corresponding to new distinguishable states (xii)i^ki- 
Now X € {xii)i£ki.iej may take any of /cq + ■ ■ • + values and must have the 
probability distribution {piqii)i!^ki,i&j- We see the unit in this operad is the unique 
probability distribution (1) S 'P(l). 

An important 7^-algcbra in Cat(Top) is given by the additive monoid M;^o. As 
a category, M^o is regarded as the one object category, the operad V acts trivially 
on objects since there is only one object. On maps, that is, on real numbers, we 
have 



Since 7^-algebras A are also categories, we can define an internal T'-algebra in A 
as a lax map 1 — > A of T'-algbras where 1 is the terminal T'-algebra in Cat (see [4] , 
[5]) for details). This basically is an object a £ A and, for each p e 'P(j), a map 
ftp : p{a, . . . , a) — a such that 

apo(gi,...,g„) = op(aqi, ■ • ■ ,ag„) for every p e 7'(?7.) and qiEV{mi) 



ai = rj. 

For K^Oi there is only one object, so a = Mj;Oi f^nd a is a map taking probability 
distributions to positive real numbers satisfying the following four axioms: 




a, 



Up for every p G 'P{n) and a G Sym, 



n 
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(1) For every p e ^{n) and qi e V{mi) 
a{po {qi,.. . ,g„)) = c 



a 




■n 



a{p) 



(4) a : V{n) R^o is continuous for all n. 

Note that, in the first of these, composition of maps in the one object category 
R^o is addition of real numbers. We require the last one since we are looking for 
functoriality in Cat(Top). As it turns out (see [4], [5]), by Fadeev's theorem [17], 
the only function satisfying these axioms, up to positive scalar multiples, is the 
Shannon entropy, Sh. 

10.2. Binary guessing trees. Consider now a general binary information mea- 
sure, S : [0,1] — > M^o- Wc will assume that S satisfies the identity axioms, so 
that we can keep our approach finite rather than full of infinite amounts of trivial 
flotsam. We can build an information measure on ternary variables in several ways. 
For example, if we are trying to guess at the value of X, which we know must be in 
{xi,X2,X3}, using only yes-or-no questions, we could employ one of the following 
two strategies: 

(1) Is X ^ xil If not, is X = X2I 

(2) Is X — xi or X2? If yes, \s X ~ xil 

Indeed, we see that any strategy that avoids asking trivial or irrelevant questions 
arises as one of these strategies with a permutation of {1,2,3}. This gives us 2 • 3! = 
12 possible ternary information measures. There is a useful way of parametrizing 
these guessing strategies with rooted trees. 

Proposition 10.1. Let S he a binary information measure with identity. For each 
n ^ 2, there is a one-to-one correspondence between rooted full binary trees with n 
leaves with labels in {1, . . . ,n} and n-ary information measures arising from S. 

Proof. Let T be a tree as above. Wc call such a tree an (n, 2)-tree. What it means 
to be full is that every vertex is either a leaf or has two children. We will see that 
eliminating the single-child nodes is equivalent to eliminating trivial and irrelevant 
questions from our set of possible questions, making it finite. To see what is the 
set of possible questions, consider that, if at a certain time we are certain that 
X e {xi, . . . ,Xn}, the yes-or-no questions available to us are exactly those of the 
form "is X G A7" , where ^ is a subset of {xi, . . . ,Xn}- We label the leaves of 
T with the possible values of X according to their original labels {i H- Xi). The 
vertices which are not leaves are uniquely labeled with the list of Xi which label 
the leaves of their subtree. The vertices will represent states of our knowledge of X 
in that the labels will denote the possible values of X given what we have already 
measured. Naturally, we begin at the root vertex, sure only that X is one of the Xi. 
At any vertex which is not a leaf, there are two child subtrees: a left one, L, and 
a right one, D. Let L be the set of leaf labels of L, D those of D. The true value 
of X must lie in either L or D. Our question then is "is X € L?" . If the answer is 
yes, we move to the left child. If the answer is no, we move to the right child. At 
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a leaf, we have ruled out all the possible values of X except the one labeling our 
current vertex. 

As an example, consider the rooted full binary tree in Figure [TJ 



xl , x2 , x3 , x4 , x5 , x6 




x5 , x6 





xl 


x3 , x4 


x5 


x6 












Figure 7. A rooted fuU binary tree 



We see X lies in {xi, . . . , x^,}. Our strategy goes like this: 
1. Our first question is "is X = X2?". 

1.1. If yes, we are done; X = X2- 

1.2. If no, we ask "is X E {xi,X3,X4}T\ 

1.2.1. If yes, we ask "is X = xi?". 

1.2.1.1. If yes, we are done; X = xi. 

1.2.1.2. If no, we ask "is X = x^T . 

1.2.1.2.1. If yes, we arc done; X = X4. 

1.2.1.2.2. If no, we arc also done; X 

1.2.2. If no, wc ask "is X = x^T . 



X3- 



1.2.2.1. If yes, we are done; X = 2:5. 

1.2.2.2. If no, we are also done; X = xq. 

Suppose these possible values occur with probabilities pi , . 
the information measure corresponding to the above tree is 



S{P2) + {1-P2)S{ 



Pl +Pi+ P3 
1 -P2 



) + (Pl +Pi+P3)S{- 



. ,Pe. We see that 



Pi 



PI+PA+ P3 
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-{P'i+P3)S{- 



Pi 



-) + {jP^+P&)S{- 



P5 



-)• 



'Pi+P3' ' ' P5+P6' 

Note that permuting the labels of the leaves permutes the pi. 

Conversely, since any question is of the form "is X £ A7" for some subsets A, we 
can build our tree inductively identifying A with L at a given vertex, and labeling 
with the possible values of X as we go, beginning with the root. Any guessing 
strategy must exhaust the possibilities for X, so any tree constructed in this way will 
be a well-defined (n, 2)-tree. Clearly this is the inverse process to the one described 
above. As an example, suppose we want to guess at an X e {a:i, X2, xa, a;4, X5}. 
First we might ask if X £ {xi,X2,X4}. If yes, we could ask ii X = xi. If not, if 
X = X2- Backtracking, if X ^ {xi,X2,Xi\, we could ask whether X = x^. This 
strategy exhausts the possibilities for X . It is represented by the tree in Figure [H 



xl , x2 , x3 , x4 , x5 




x3,x5 





Figure 8. A guessing strategy 



Given an (n, 2)-tree T, there is a canonical way of arranging and parenthesizing 
an expression of the form xi ©5 • • • ©s a;„ so that it may be evaluated. This is 
the same one given in the Catalan number identity [15]. We consider the tree T' 
which is labelled 1, . . . , n from left to right. Let ctx G Sym„ be the permutation 
that sends the left-to-right labeling to the original one on T. We define (xi 
• ■ • ©5 x„)t = (xctt(i) ®S ■ ■ ■ ffis Xa.j.{n))T'- Thus, it sufficcs to cousidcr the case 
when T is labelled left-to-right. In this case, there is a 1 ^ r < n such that for 
1 ^ j ^ r, Xj is a label of a leaf of the left subtree L, ie. Xj G L, and for all 
r < j < n, Xj 6 D, where D is the right subtree of T. Then we define inductively 

(xi ©S • • • ©S X„)t = (xi ©s ■ • ■ ©S Xr)L + (Xr+1 ©S 

two children T2 giving (xi ©s X2)t2 = xi ©s X2. 



3s Xri)D, with a tree with 
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Theorem 10.2. Given an (n, 2)-tree, T, and a binary information measure S with 
identity, the following holds: 

(xi ©s • ■ • ©s x„)t niin (V" p^Xi - rS'T(pi, . . . ,p„)). 
J2pi=i — 

Proof. Before we begin the proof in ernest, we illustrate the argument with an 
explicit example. We see the tree in Figure [71 which we denote T. corresponds to 
the arrangement of parentheses xi (Bs {{^2 ©s {xs ©s 0:4)) ©5 (xs ©s xg)) and the 
permutation tr = (12)(34) e Synig. We calculate 

XI ©s iix2 ©s (a;3 ©s X4)) ©s {x5 ©s xe)) = 

min(pixi + (1 -pi){{x2 ©s (2:3 ©s 2:4)) ©s (2:5 ©s xe)) - TS{pi)) 
pi 

= mm{pixi + {l-pi)mm{p2{x2®s{x3®sX4:))+{l-p2){x;^®sX6)-TS{p2))-TS{pi)) 

Pl P2 

= min pixi + (1 - pi)p2 min (^3X2 + (1 - P3)(a;3 ©S 2:4) - TS{p3)) 

Pl:P2 \ P3 

+ (1 -Pi){^ -P2)min(p4X5 + (1 -P4)xe - TS{p4)) - T{S{pi) + (1 -pi)S{p2)) 
Pi 

= min {piXi + {I - pi)p2P3X2 + (l - Pl)P2{l - P3)P5X3 

P1,P2,P3,P4,P5 

+ {l -pi)p2{l -P3)(l -P5)X4 + (1 -P2)P4X5 + (1 -P2)(l - P4)a;6 

~T(5(P1) + (1-P1)5(P2) + (1-P1)P2^(P3) + (1-P1)(1-P2)^(P4) + (1-P1)P2(1-P3)5(P5))). 

Now we make the substitution 
qi =Pi 

'72 =(1 -Pi)P2P3 

93 =(1 -Pl)P2(l -P3)P5 

94 =(1 -Pl)P2(l -P3)(l -P5) 

95 =(1 -Pl)(l -P2)P4 

96 =(1 -Pl)(l -P2)(l -P4)- 

We notice qi + ■ ■ ■ + qa = 1, and 

Pl =91 

P2 =(92 + 93 + 94)/(l -91) 

P3 =92/(92 + 93 + 94) 
P4 =95/(95 +96) 
P5 =93/(93 + 94)- 

Notice that these look like relative probabilities. This is no coincidence. Making 
this substitution above yields 

xi ®s iix2 ©s (a;3 ©s 2^4)) ©s (xs ©s x^)) = 
min (V(7,x, - r(S'(gi) + (1 - 

+ (92 + 93 + 94)5( ) + (93 + 94)5(^^) + (95 + 96)^(^^)). 

92 + 93 + 94 93 + 94 95 + 96 
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Applying a we obtain 

(xi ©5 X2 ©s 2^3 ffis 2^4 ©s 2:5 ©s Xii)^ = min (y^PiX^ - TSt{pi, . . . ,P6)) 

J2pi=i — 

as the theorem claims. 

Now we are ready to prove the theorem in general. 

Lemma 10.3. Suppose that, at the root, the tree T has left subtree L with I leaves, 
and right subtree D with d leaves, and the leaves o/T are labeled left to right. Then 

St{pi,- ■ .,pi,pi+i,. . ■,pi+d) = 
S{pi + • ■ • +pO + (Pi + • ■ ■+Pi)Si.{ ^ , . . . , J' ) 

Pl^ vpi pi H vpi 

+ \Pl+l H Vpi+d)t>Yi[ ' ' , . . . , ' ■ ). 

H ^Pi+d H ^Pi+d 

Lemma 10.4. Suppose at the root, T has left subtree L with I leaves, and right 
subtree D with d leaves, and the leaves of T are labeled left to right. Then 

{xi ®s---®sxi ©s xi+i ©s ■ • • ©s Xi+d)T 

= min(p(xi ©s ■ • • ©s xOl + (1 - p){xi+i ©s • ■ • ®s xi+d)T, " TS{p)). 
p 

The proof of both of these statements is immediate from the definitions. 

Now, clearly the theorem holds when T has two leaves, and since our trees are 
full, we can use this as the base case in an induction. 

Suppose the theorem holds for all trees with less than n leaves. Let T be an 
(n, 2)-tree with leaves labeled from left to right. At the root, since T is full, T 
has nonempty left and right subtrees, L and D, with I > and d > leaves, 
respectively, such that I + d = n, so l,d < n. By the inductive hypothesis and the 
second lemma above, 

(xi ©s ■ • • ©s a^rOx = mm(p min (S^piXi-TSi,{pi,...,pi)) 

P piA hpi=l ^ — ' 

+ {l-p) min {y2piX,-TSr>{pi+i,...,pi+d))-TS{p)). 

Make the substitution = ppi, for each i G {1,. ..,?}, and qi = (1 — p)pi, for 
each i G {I + 1, . . . ,1 + d}. Note that qi + ■ ■ ■ + qi = p and qj+i + ■ • ■ + qi+d = 1 ^ P- 
This yields 



(cci ©s • • • ©s a;„)T = min (> qiXi 

91 qi 



-T{{q, + --- + qi)SU- ^ ^ ^ ^ . 

gi H h qi-\ \-qi 

qi+i qi+d 



+{qi+i H y- qi+d)S-D{- 



qi+i H h qi+d qi+i H h qi+d 

+5(gi + --- + g/))), 

which by the first lemma is 

min (S2qiXi-TST{qi,---,qn)))- 
T,'}i=i — 

We need now show that this holds for arbitrary labclings of the leaves of T. If 
cr is a permutation of {1, . . . , ri}, then 

(xaii) ©s ■ • ■ ©s Xa(n))T = min (Y] qiX„(^i) - TSriqi,- ■ ■ ?«)) 
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= ^^"1 (X^Pia:.* - TST{Pa{l), ■ ■ ■ ,Pa{n))), 

x;Pi=i — 

where we have substituted pi = qa-'^(i)- This proves the theorem. □ 

The connection between these guessing games and the thermodynamics of mixing 
discussed in [S] can be intuited in the following way. The entropy of a system arises 
from considering the "correct counting" of states. In more words, some states are 
indistinguishable from others, and this affects their multiplicity in the partition sum. 
The entropy function tells us what the overall degree of distinguishability is. We can 
see this point of view in Boltzmann's famous equation asserting S = ks log 51, where 
f2 is the number of microstates which degenerate to a given macrostate. When we 
perform mixtures in a certain order, we are giving an order to this distinguishing 
process, as we are when we decide on an order to ask questions in a guessing game, 
distinguishing possible values from impossible values of the unknown variable. 

10.3. General guessing trees. Now suppose for each n ^ V C {m G N | m ^ 

2} we have an n-ary information measure S'„. We want to impose the following 
condition. 

(1) (Coherence) Suppose n > m and, for all but 1 < ii < ■ ■ ■ < im < n, pj = 0. 
Then 

SniPl, ■ ■ ■ ,Pn) = SmiPh, ■ ■ ■,Pi„J- 

We can always write 

<S'n-l(Pl, ■ ■ ■ ,Pn-l) = SniPl,- ■ . ,P„-1,0), 

so that we can take V as an initial segment of N^2- This way, we can instead think 
about V = sup V. Many definitions of entropies have v = oo. Examples include the 
Shannon, Renyi, and Tsallis entropies. These are generally defined by functions 
/, g such that 

Sn{pi,---,Pn)^ giPi))- 

Any entropy of this form trivially satisfies the coherence axiom. 

In this more general setting, we can ask any question with up to v possible 
answers. This potentially gives us many new ways to play guessing games, or 
equivalently, to build more general information measures. For example, if w ^ 5, 
then we can measure a 12-ary random variable X € {xi, . . . , 3:12} by asking first 
whether X G {xi, . . . , x^}. If yes, we simply measure the value of X . Otherwise, ask 
which of {xe,xy,xs}, {a;9,a;i2}, or {xio,a;ii} contains X, and then simply measure 
the value of X (note that order may matter: Sn may not be symmetric). This gives 
us information 

S2iPl H h P5,P6 H ^Pl2) 

Pi P5 . 



+ (pi + ---+P5)S'5( 
-{Pe ^ hpi2)S'3( 



Pi H h _P5 Pi H \-p5 

P6+P7+P8 P9+P12 PlO+Pll 



+ iPe +P7+P8)S3i 



P6-\ ^Pl2 P6 H I-P12 P6 H \-Pi2' 

Pe P7 Pa ^ 

Pe + P7 + Ps' P6 + P7 + Ps' P6 + P7 + PS 



+ {P9+Pl2)S2( ■ , ■ ) 

P9 + P12 P9 + P12 

, / I \Q f P'^-O Pll ^ 

+ (PW+Pll)^2( ■ , ■ ), 

PlO+Pll PlO+Pll 



38 



MATILDE MARCOLLI AND RYAN THORNGREN 



where now we write 5*2 as a two- variable function for consistency. We see something 
extremely similar to the binary case is happening here. 

Proposition 10.5. Let n,v ^ 2, and suppose for each 2 ^ j < v + 1 we have 
a j-ary information measure Sj, and these together satisfy the coherence axiom. 
Guessing strategies of n-ary random variables where we allow questions of up to v 
possible answers are in bijective correspondence with the set of {n,v) -trees, rooted 
trees with labelled leaves such that every vertex is either a leaf or has between 2 and 
v children. 

Proof. Every relevant question that can be asked is of the form "which of , . . . , Am 
contains X?" for certain disjoint subsets Ai,. . . ,Am- We identify these subsets 
with the leaves of the m subtrees extending from the current vertex, once again 
identifying the vertices with states of our knowledge of X. 

For example, from the previous algorithm we have the tree in Figure |9l 




Figure 9. An (n,z;)-tree 

Conversely, to go from an (n, w)-trec to a guessing strategy one must only follow 
the tree to its leaves. 

Now we have some basic n-ary functions for more than just n = 2. Namely, we 
can define 

(10.1) xi ®s ■ ■ ■ ®S Xn min ( p.,Xi ~ TSn{pi, ■ . ■ ,Pn))- 

This has a thermodynamic interpretation of a simultaneous mixing of n gas 
species. We have the following result. 

Proposition 10.6. Let n> 2. The following hold. 
(1) For every j 

©s • ■ • ©s Xj ©s oo ©s Xj+2 ©s ■ • ■ ©s a;„ = 
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Xi ©S ■ • • ffis Xj ®S Xj+2 ©S • ■ • ffis 

if and only if the Sn share the coherence property. For n = 2 this is the 
identity property. 

(2) xi ©s • • • ©s Xn is symmetric if and only if Sn is symmetric. For n = 2 
this is commutativity. 

The proof of this fact is immediate. 

Wc can gencrahze the parentheses correspondence to (n, u)-trees: to any (n, v)- 
tree T we can associate a unique n-ary function (xi©s- • ■(BsXn)T given by arranging 
parentheses according to T and Xi according to the labels on the leaves. 

For example, to the tree above we associate 

(Xl ©s X2 ©S X3 ©s X4 ©5 X5) ©5 ((xg ©s X7 ©s Xs) ©S {xg ®S X12) ©S (xio ©S Xn)). 

We have then the analog for (n, w)-trees of Lemmas 110.31 and 110.41 

Lemma 10.7. Suppose the root of an {n,v)-tree T has sub-{lj,v) -trees (resp. from 
left to right) Ai,...,A,„, and the leaves of T are labeled left to right. Let Lj = 
h + ■ ■ ■ + Ij and Lq = 0. Then the following holds. 

St{pi, ■ ■ ■ ,Pn) = 



+ 1 H \-PLj' ' PL,-i + ] 

+ Sm{PLo + l H I^PLi, ■ • ■ ,PL„_i + l H ^PL^) 



Lemma 10.8. Suppose the root of an {n,v)-tree T has sub-{lj,v) -trees (resp. from 
left to right) Ai,...,Am, and the leaves of T are labeled left to right. Let Lj = 
li -\- ■ ■ ■ + IjT Lq = Q. Then the following holds: 

(xi ©s • ■ • ©s x„)t = min {qi{xi ©s ■ • ■ ©s XiJ 

H h qm(xii + ...+;„_i + i ©s ■ • ■ ©S Xi^ + ...+i^) 

-TSrniqi, ■ ■ ■,qni))- 

As before, both of these are immediate from the definitions. Finally, we have 
the theorem: 

Theorem 10.9. Given an {n,v)-tree T, and for each 2 ^ j n an information 
measure Sj, such that together they satisfy the coherence axioms, the following 
holds: 

(xi ©s • ■ • ©5 x„)t = min (S^ p^Xi ~ TSt{pi, ■ ■ ■ ,Pn))- 
I]pi=i — 

Proof. Once again we proceed by strong induction on the number of leaves. We 
know the theorem holds for n = 2. Suppose the theorem holds for every (m, T;)-tree 
with m < n. Let T be an (n, w)-tree with leaves labeled from left to right. T has 
k ^ 2 sub-(/i, w)-trees starting at the root (resp. from left to right) Ai, . . . , Afc with 

k > 0. We must have Zi + \- Ik — n, so k < n. By the inductive hypothesis and 

the second lemma above then, 

(xi©s- ■ •©sx„)t = min (qi min (piXiH \-pi^xi^-TSai{pi, ■ ■ ■ ,Ph)~\ 

E9i = l piH hPii=l 
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+<lk min ( V pjXj-TSAAPh+-+ik-i+i,---,Ph+-- 

J=ilH hifc-i+1 

-TSkiPl H \-Ph,- ■ ■ ,pi,+...+i^_,+i H hp,i)). 

For each i G {1, . . . , fc}, and each j £ {^iH h^i-i + l, . . . , where we 

define = 0, we make the substitution qj — QiPj. That way, X^jt^i+^'-'+i -i+ ~ 
Qi, so we have 

(a;i®s- • •©sx„)t = min (Y] g^i-j - r((gi H hgiJ-SAiC , , , , 

E93=i ^ 91 H \-qh 

, l~ , , ~ \Q f qil + --- + lk-l+l ^^ 

+(qh+-+h-i+i + ■■■ + qn)SAk {- : — ^, ■ • •))• 

*i+---+ifc_i+i H \- qn 

By Lemma ri0.7[ this equals 

min (V q^xj - TSTiqi, ■ • ■ , qn))- 
E9j=i — 

Now let a be any permutation of {1, . . . , n}. We see 

(2:^(1) ©s • • • ©s a;^(„))T = niin (V" (7^x^(4) - TST{qi, ■ ■ ■ qn)) 

2l,9i = l — 

= ^in^(^P,a;j - TST{Pa{i), ■ ■ ■ ,Pa{n))), 

where we have substituted pi = qa--i(i)- This proves the theorem. □ 

10.4. Information algebra. We define Tv{n) to be the class of (n, u)-trees such 
that 7^(0) contains only the empty graph and 7^,(1) contains only the unique one- 
leaved (n, w)-tree. We put an operad structure on the union of these collections, 
T with composition given by leaf-to-root composition of trees, which is clearly 
unital, associative, and Sym-equivariant. Our underlying category is the cartesian 
monoidal category of sets of graphs, with k = 71,(1). Note that this unital operad 
structure, if also given a free group structure, forms the well-known ^00-operad. 
Consider the one-object topological category, R, and a coherent set 

{Sj ■.P^R^o\'2i^j<v + l} 

of information measures. For each n ^ 2, and each T G Tin), we define 

T{xi,...,Xn) = min (S^ p.^Xi ~ TSt{pi, ■ ■ ■ ,Pn))- 

With the definition of St, as in the previous section. For T(l), wc define (a:;)T = x, 
for 71; (0) = K, we define ()t = 00. 

By Theorem 110.91 above, this is the same as {xi ©s • • • ©s Xn)T, which clearly 
behaves well under composition of trees and the action of Sym,j, so this makes R 
a T-algebra, which we call the information algebra (R,S). This characterizes the 
complete algebraic structure of the Witt semiring R over K arising from S. The 
next proposition is written in the original convention for semificlds and summarizes 
some characteristics of this action, each of which are immediate from the definitions. 

Proposition 10.10. Let T G T{n), xi, . . . , x„, y G K, a G M^o- Then the follow- 
ing hold. 
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(1) The T-algebra structure on R is additive: for all I ^ j ^ n 

T(xi , . . . , Xj—i , Xj -\- ?/, Xj-^i , . . . , Xji^ — 

T(xi , . . . , Xj , . . . , Xji^ -\- T(xi , . . . , y, . . . , Xji^. 

(2) Multiplication distributes over the T-algebra structure: 

y{xi ©s • • • ®S Xn)T = {yXi ©s • • • ©s yXn)T- 

(3) The T-algebra structure also satifies 

{xi ©s • ■ • ©s x,,)^{T) ^{x'^Os--- (Bs xZhiaT). 

The relations which are most natural to consider are of the form 

Ti(a;i, . . . ,2;„) = T2(xi, . . . ,a;„) \fxi € i?, 

where Ti and T2 arc (n, i')-trces acting on the information algebra {R,S). The 
reason is that we can interpret this as an equivalence of guessing strategies, so these 
are exactly the kind of relations that would define an information measure. Note 
it is not just Ti and T2 which are affected by this relation. Because composition 
of trees gives the composition of their actions on R, whenever some tree can be 
written A o Ti o (Ai, . . . , A„), this is equivalent to A o T2 o (Ai, . . . , A„). The 
equivalence classes of these trees for some fixed set of relations R, defines a quotient 
operad T/R which is the set of possible guessing strategies up to equivalence under 
the information measure. The terminal object in this construction is the one with 
exactly one (m, w)-tree for each to, which is precisely the quotient operad arising 
from the Shannon entropy. 

However, one quickly finds that these simple relations are inadequate for describ- 
ing the full range of information measures. If we have an equivalence of trees, we 
can always prune corresponding leaves by inserting the identity, 00 in the current 
notation, in the place of that variable. For binary information measures, one has 
the foUowinng fact. 

Proposition 10.11. Suppose S is a commutative binary information measure, 
and Ti, T2 are (n, 2)-trees. Either Ti = T2 is implied by commutativity or implies 
associativity, hence forces S to be the Shannon entropy. 

Proof. We proceed by induction on n. When n = 3, one checks the above is true by 
simply checking each case. Suppose the theorem holds for all m < n. By pruning 
a leaf, we see that either the relation implies associativity or the pruned subtrees 
are equal. In the case of the latter, prune a different leaf and we see the theorem 
holds. □ 

Thus, one may wish to pass into a setting where we may consider linear com- 
binations of trees, ie. we can put a free vector space structure on our original 
operad. This gives us an Aoo operad, with the action on the information algebra R 
extending uniquely under the Frobenius action. 

Let us now consider what an internal 7~-algebra in R is. For each n, this is a 
continuous map a„ : T{n) R such that the following hold: 
(1) for aU T e T{n), and Ai e T{mi), . . . , A„ e r(m„), 

a„,+...+„i„To (Ai, . . . , A„) = a„(T) + T(a„,,(Ai), . . . ,a„„(A„)); 
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(2) for all T e r(n) and ct G Sym„, 

a„(CTT) = a„(T); 

(3) The following condition also holds: 

(10.2) ai(r(l))-0. 

To simplify notation, we will suppress the subscripts on a and just consider 
a : T ^ R^o- The second condition above just says that a(T) does not depend on 
the labels of T. 

For each 2 ^ n < ?; + 1 we define h„ G M^o as the unique value a takes on the 
(n, w)-trees with n + I vertices, that is, those corresponding to Sn- 

Every tree in T is built from these basic trees, and by the first condition above, 
so is a(T). 

If at the root T has subtrees Ai, . . . , A„ from left to right, then 

a(T) = hn + a(Ai) ©s • • • ©s a(A„)- 

For the tree in Figure ^ this gives 

h2 + h5 ©s (hs + hs ©s h2 ©s /i2)- 

We see + goes down the tree, and (Bs goes across the tree. 
It is easy to see /13 > /13 ©s /12 ©s ^2, and /12 ^ /is ©s {hs ©s /12 ©s /12), so this 
can be simplified to 

h5 ©s (/i3 ©s /i2 ©s /12), 
which we see can be obtained through a different recursion strategy. Instead of 
picking off the subtrees at the root, we could pick off the basic subtrees just above 
the leaves. This is just another way of writing T as a composition of trees, and 
puts the recursion into the second term rather than the first in (|10.2p . 

Because of this recursion, every internal T-algebra of M^o is determined by the 
sequence (/ij)2^j<i>+i (by the third condition above, implicity hi = 0). 

When R = M.™^^'* , and we use S ~ Sli, the Shannon entropy, then x ©s y = 
(x^l^ + y^l^)^ ^ so the above becomes 

a(T) = max(/i„, (a(Ai)i/^ + ■ • ■ + a(A„)i/^)^). 

11. Further perspectives and directions 

We sketch here some possible further directions where the notion of thermody- 
namic semirings may prove useful. 

11.1. Information geometry. Information geometry was developed [2], [3|, [22] 

as a way to encode, using methods based on Riemannian geometry, statistical 
information, such as how to infer unobserved variables on the basis of observed 
ones by reducing conditional joint probabilities to marginal distributions. 

We consider a smooth univariate binary statistical n-manifold Q as in Definition 
18.11 parameterized by 77 G A" C M". One may deal with the multivariate case 
similarly. 

The Fisher information metric (see [3]) on information manifolds is given by 

/■ 91np(x;6') 91np(x;6') 

^^^^^^=y 9^^" 

and it defines a Riemannian metric on a statistical manifold Q. 
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Another important notion in information geometry is that of e-fiat and m-fiat 
submanifolds, which we recall here. 

A submanifold S C Q is e-flat if, for all t £ [0, 1] and all p{r]) and q{ri) in S 
the mixture logr{r],t) = tlogp(r]) + {l—t) \ogq{rj) +c(i), with c{t) a normalization 
factor, is also in S. 

A submanifold 5 C Q is m-flat if, for all t e [0, 1] and all p{rj) and q{rj) in S the 
mixture r{r], t) = tp{ri) + (1 — t)q{ri) is also in S. 

Onc-dimensional e-flat or m-flat manifolds are called e-geodcsics and m-gcodcsics. 
In information geometry, maximum posterior marginal optimization is achieved by 
finding the point on an e-flat submanifold S that minimizes the KL divergence, 
see [3], [HI- It turns out that the point on an e-flat submanifold S that mini- 
mizes the KL divergence also minimizes the Riemannian metric given by the Fisher 
information metric. 

More precisely, when considering the KL divergences KL(p; 5(77)), where q{r]) 
varies in an e-flat submanifold S of the given information manifold Q, there is a 
unique point q(r]) in S that minimizes KL(p; q{r])) and it is given by the point where 
the m-geodesic from p meets S orthogonally with respect to the Fisher information 
metric (see Theorem 1 of [22]). 

Thus, from the point of view of information geometry, it seems especially inter- 
esting to look at cases of the thermodynamic semiring structures 

p 

for distributions q{T]) that vary along e-flat submanifolds of information manifolds 
and recast some Riemannian aspects of information geometry in terms of algebraic 
properties of the thermodynamic semirings. 

11.2. Tropical geometry. Most of our results have a very natural thermodynamic 
interpretation when written explicitly in the case of the tropical semifield (seen as 
a prototype example of characteristic one semiring as in [TU], [H]). Thus, besides 
the original motivation arising in the context of Fi geometry, it is possible that the 
theory of thermodynamic semirings we developed here may have some interesting 
applications in the setting of tropical geometry [24] . 

The use of tropical geometry in the context of probabilistic inference in statistical 
models was recently advocated in |41| . In that approach one considers polynomial 
maps from a space of parameters to the space of joint probability distributions 
on a set of random variables. These give statistical models described by algebraic 
varieties. The tropicalization of the resulting algebraic variety is then used as a 
model for parametric inference, for instance, by interpreting marginal probabilities 
as coordinates of points on the variety. 

It would therefore seem interesting to extend the encoding of thermodynamic 
and information-theoretic properties into the additive structure of the semiring 
to the broader context of tropical varieties. In particular one can consider the 
patchworking process, where operations are pcformed on the "quantized" varieties, 
and then the limit in the Maslov dcquantization, corresponding to the residue 
morphism T — 0, is performed, obtaining the new tropical variety. 

Observe, for instance, that in the usual setting of tropical geometry, in passing 
from an algebraic variety to its tropicalization, starting with a polynomial / defining 
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a hypersurface V in (C*)", one can proceed by first considering an associated Maslov 
dequantization, given by a one-parameter family fh, wliose zero set one denotes by 
Vh- One then considers the amoeba obtained by mappint Vh to R" under the map 
Log^(zi, . . . , Zn) = (^log l^il, . . . , h\og \zn\). Onc obtains in this way the amoeba 
Ah ~ Log^(V/i). As we send the parameter /i — 5- 0, the subsets Ah C M" converge 
in the Hausdorff metric to the tropical variety Tto{V), see [30]. For example, for a 
polynomial of the form f{x) — Ofex'^, one obtains fh{x) by passing to Ofe = e'''= 
and a;'' = e*^*, so that one can then replace v = log(X]fc e'^*'^'''' ) by the deformed 
Vh = hlog{^i^ gikt+bk)/h^^ which in turn defines the dequantized family fh{x). 

By comparing with Proposition I4.3[ one can see that the Maslov dequantization 
can be expressed in terms of the operation ©sh,T, where the dequantization pa- 
rameter h plays the role of the temperature T, as also observed in [10]. Therefore, 
one can introduce variants of the Maslov dequantization procedure, based on other 
operations ®s,t, for other choice of information measures. In particular, one can 
consider dequantizations based on various n-ary information measures of the form 



with the data labelled by trees T, as we described in fJIUj above. 

While one can expect that the tropical limit itself will be independent of the 
use of different information measures in the dequantization procedure, operations 
performed at the level of the amoebas Ah will likely have variants with different 
properties when the Shannon entropy is replaced by other information measures of 
the kind considered in this paper. 

11.3. The thermodynamics of M"". In the characteristic p case, the functoriality 
of the Witt construction provides a way to construct extensions of the field of p- 
adic numers Qp = Frac(Zp) using the fact that Zp = Wp(Fp), and applying the 
same Witt functor to extensions ¥q. This gives WpiVq) ~ Zp[Cg-i], which is the 
valuation ring of an unramificd extension Qp(Cg-i) of Qp, see |33j . 

It was observed in §7 of [TU] that, in the case of the chracteristic one version of 
the Witt construction, when one considers the ©sh,T simultaneously for all possible 
temperatures T. one can describe a candidate analog of "unramified extension" M"" 
in terms of analogs of Tcichmiiller characters given in the form xt(/) = fiT)^^"^ 
and an analog of the residue morphism of the form e(/) = \imT-^o XT{f){T)"^ . 

We can formulate this in the general case. We find, first of all that the Frobenius 
lifts do not depend on the information measure. 

Proposition 11.1. If R is a thermodynamic semiring over a suitably nice semifield 
K defined by the information measure S , then the Frobenius lifts from K to R in 
such a way that 



Proof. We see that this is a result of the general form of the temperature dependence 
in the current context. In symbols, we are looking for 





Fr{x[T)) 



xiT/ry. 



Fr{x{T) ®s y{T)) = e'-/(^)^(")x(/(T))™y(/(r))'-(i-") 
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where the residue morphism forces Fr{x{T)) = x{f{T)Y for some invertible /, 
depending on r. We see from the above that /(T) = T/r, proving the claim. □ 

This forces the characters to have the same form as in the Shannon entropy case, 
ie. XT{f){T) = f{Ty/'^. However, these characters are additive only if (x(T) ©5 
y{T)Y/'^ = x^l'^ + y^/^, which means S must produce the same thermodynamic 
structure as the Shannon entropy, hence, by a theorem above, S is the Shannon 
entropy. Note that this analysis holds also in the g-deformed Witt construction 
leading to the Tsallis entropy discussed in ^7.11 

If we pass to the field of fractions of these characters, and consider further infi- 
nite sums of these characters, the resulting expressions begin to resemble partition 
functions in the Euclidean path integral formulation, see §7 in |10| . Indeed, if one 
uses instead R™™'+ U {c»}, these are equal to equilibrium free energies of the type 
observed in ijH The failure of the additivity of the characters in 7^"" can thus be 
interpreted in terms of nonextensivity. This suggests that, as this candidate for 
7?,"" is investigated, new algebraic interpretations of nonextensivity will arise. It 
would also be interesting to see if a notion of character which is additive on the 
g-deformed Witt construction could give rise to a one-parameter family of 7?.""'s. 

11.4. Thermodynamics in positive characteristics. The main motivation for 
the Witt construction in characteristic one given in JJDj and pTj , which provides the 
prototype example of a thermodynamic semiring built on the Shannon entropy, is 
to provide an analog in characteristic one of the formulae for the summation of Te- 
ichmiiller representatives in the case of multiplicative lifts to Zp of the characteristic 
•p elements in Fp. 

One can then reverse the point of view and start from the more general ther- 
modynamic semirings associated to other forms of entropy, such as Renyi, Tsallis, 
KuUback-Leibler, with their axiomatic characterizations, and look for characteris- 
tic p analogs of non-extensive thermodynamics and other such variants of statistical 
physics. 

For instance, we saw in ijT] above that there is a one-parameter deformation 
of the Witt construction in characteristic one, which yields a characterization of 
the Tsallis entropy Tsq as the unique binary information measure that satisfies 
the associativity, commutativity and unity constraints for this deformed ©s,T,a 
operation. 

One thinks of the original ©sh,T with the Shannon entropy as in [lOj and |11) . 
as being the correct analog in characteristic one of the p-adic Witt construction 

x®wy^ ^ Wj,{s)x''y^~\ 

with Ip the set of rational numbers in [0, 1] with denominator a power of p and 
wp{s)^ J2 «;(p",a)T"eFp((T)), 

a/p'^—s 

where the w{p''\k) G Z/pZ, for < k < 7?" are determined by the addition of 
Teichmiiner representatives 

00 

t{x) + T{y) = T(a- + y) + ^ T w(p", p". 

n=l 
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Thus, one can equivalently think of the universal sequence of the w{p"',k) as 
being the characteristic p analog of the Shannon information. Adopting this view- 
point, one would then expect that the one-parameter deformation of the Witt con- 
struction in characteristic one described in ijTl which leads naturally from the Shan- 
non entropy to the non-extensive Tsallis entropy, may correspond to an analogous 
deformation of the original p-adic Witt construction that leads to a notion of non- 
extensive entropy and non-crgodic thermodynamics in characteristic p. 

It should be mentioned that there arc in fact interesting known g-deformations 
of the Witt constructions, see for instance [40]. These can naturally be described 
within the setting of A-rings (see [40]). This seems especially useful, in view of the 
whole approach to Fi geometry based on A-rings, developed by James Borger in 
[7] and [8], [9] (see also [36], [37] for other related viewpoints). However, a reader 
familiar with the positive characteristic Witt construction will notice that Connes 
and Consani's construction generalize the p-Witt ring from a rather unconventional 
expression for its addition. This is difficult to translate into the A-ring approach 
to the Witt ring. A definition of A-rings in characteristic one which reproduces the 
Witt rings considered in this paper would likely be interesting both geometrically 
and physically. 

This also suggests that identifying suitable analogs of other entropy functions 
(Tsallis, Rcnyi, Kullback-^Leiblcr) in characteristic p, via deformations of the ring 
of Witt vectors, may also further our understanding of Fi-gcomctry in the A-ring 
approach. 
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